Sample size, in a sports betting context, refers to the amount of data items you base your analyses around.
Short-term results from a small (and insignificant) sample do not say with high ‘confidence’ how likely a strategy is to succeed long-term.
The more items you have in your sample size, the more clarity and accuracy you have in verifying the profitability of your betting strategy.
Sample Size In Betting
In sports betting you can analyse past data to make estimates and find trends.
For instance, you could determine whether or not there was a home advantage in a football league based on the sample of outcomes you analyse. Of course you would expect to find that, over a very large sample, the home team has a clear advantage. But over a small sample of results you cannot accurately conclude anything meaningful about the home advantage.
In other words: the size of the sample determines the precision and level of confidence that you have in your estimates. Essentially, a large sample size increases confidence and reduces uncertainty.
In the remainder of this article I refer to a “test strategy” of recorded bets using real odds. Using this data set I illustrate the importance of using an appropriate sample size. I try to answer one key question: What is the estimated (long-term) yield of the bet selection method?
By estimating the yield using past data I hope to say (with confidence), whether or not the strategy is profitable, and to determine what % ROI I’d expect to make using it going forward.
The Uncertainty Of A Small Sample Size
Determining a “small” or “large” sample size is actually quite difficult. If in doubt, the best option is to simply collect as much data as possible. Take a look at the early results of the test strategy, which performed exceptionally well.
The first 15 days of the test strategy produced the following results:
- Bets: 2,375 bets
- Average odds: 9.95 average odds
- Yield: +5.77%
The yield of +5.77% and the positive trend of the graph is a promising sign considering a total of 2,375 bets were placed — a seemingly substantial sample size to base future predictions on.
WARNING! Making the assumption that this strategy is profitable is precisely the danger in analysing past results in for betting. The sample is not representative of the future, and I'll prove this in the next section on 'Large Sample Sizes'.
This estimated yield has a level of uncertainty which depends upon the quality of the data variability as well as the sample size. The more variable the sample, the greater the uncertainty in our estimate.
In this example I have to consider:
- Was the selection method for the bets completely fair?
- Is the initial 15 (consecutive) days representative of all days in the year?
- Have the selections had an uncharacteristically good run of form, or is this success rate normal?
- Are the average odds (at 9.95) capable of producing high variance results that swing in one direction or the other?
- Has the weather favourably, or unfavourably, impacted the results?
I believe that the selection method is fair; it’s formed solely from past racing results. But the other uncertainties listed above, amongst many more, could be significant factors for the positive results observed during the first 15 days.
While it might seem a little cynical to pick apart a winning run, it’s better to be critical of your results and to continue collecting data rather than making naïve assumptions and placing real-stake wagers on them. Failure to fully analyse results can result in real money losses. Remember: larger sample sizes improve the accuracy of the conclusions you have made, and reduce uncertainty.
the importance of A Large Sample Size
As I’ve mentioned, the test strategy does indeed take a turn for the worse despite the exceptionally promising start. This was evident from continued data collection under the exact same conditions, taking the sample up to 17,717 bets of £2.
The following graph incorporates the initial 2,375 bets, now up to a total of 17,717 total bets.
- Bets: 17,717
- Average Odds: 9.9
- Yield: -0.63%
With the increased sample size we have greater precision. Assumptions we could have made previously from the smaller data set are now somewhat disproved. Crucially, the yield (ROI) settles at -0.63%. The inconsistency in the graph gives no real reason for us to believe that this selection method is profitable.
Theoretically, if we could take this sample to infinity and include every future bet, then we would obtain the true value that we are trying to estimate – the actual yield of the strategy with no uncertainty. This is of course impossible. Nonetheless this sample size of over 17k bets is more than sufficient to base wise decisions around. Given my experience in betting I wouldn’t be rushing to use this betting strategy!
Despite being crystal-clear that this strategy is not a profitable one, I am still left with other uncertainties such as:
- Did something change which meant that the early results did not continue?
- Did competitors on the betting exchange adjust in response to the strategy?
My strategy was run under consistent conditions, over a long period of time. So I suspect the early results were fortunate, and the ROI evened out to where it belonged (around 0%). Although, perhaps, other traders spotted patterns occurring in the markets (i.e. my bets) and flipped the advantage in their favour. Truthfully, it’s hard to say with complete confidence.
If you encounter this type of scenario you need to look at every aspect of your approach to see if there was anything which might’ve affected a change in the results.
If you’re looking to analyse your own betting results, I highly recommend Betting.com’s Portfolio Tracker. You can log bets, check on the results without any manual input, and verify the profitability of your strategies.
A Step Further: Power & Effect Size
Increasing the sample size gives greater power to detect differences.
Suppose that we were also interested in whether there’s a difference in the proportion of young and old winning horses. We may, for example, believe that older, more experienced horses perform better. We could ask the question:
- Is the observed effect (the difference in results) significant given that the total number of future bets is potentially limitless?
- Or might the proportions of the older winning horses be the observed effect due to chance?
Without delving too much into the specifics of statistical tests, it’s worth mentioning that you could take things that extra mile by using what’s known as the ‘Binomial test of equal proportions’ or ‘two proportion z-test’. If you find that there is insufficient evidence to establish a difference between young and old horses, then the result is not considered statistically significant. Usually a cut-off level is chosen in advance of performing a test (e.g. 10%) and is called the “significance level”. If the difference is greater than 10% over a large data set then we deem there to be a “difference of significance”.
If we increase the sample size of our test strategy to, let’s say 100,000 bets, we would have more data to support estimates based on different aged horses. Increasing our sample size therefore increases the power that we have to detect the difference. More formally:
Statistical power is the probability of finding a statistically significant result, given that there really is a difference (or effect) in the races.
Large sample sizes give more reliable analyses with greater precision and power — but it takes more time and effort. therefore automating data collection, using sources of available data, and investing in sports betting analytics tools is essential for making accurate predictions and assumptions.
The bottom line is: if you want to create and run successful strategies, it’s imperative that you collect and analyse large sets of data.