Many mathematical methods can be used to create football prediction models. I’ve no doubt that you could spend years perfecting a ‘betting system’.
But take it from me: applying statistics to sports betting is not an easy road to making money. With that in mind the aim of this post isn’t to teach you a step-by-step approach of modelling football odds, but to summarise what I’ve learnt from my own experience.
This is a great starting point for those looking to model their own odds using statistics.
I’ve attempted a few grading systems for football prediction. What I have found is that relatively simple models are effective in forming predictions — but they are somewhat limited in identifying value. Allow me to explain how basic grading systems work…
A grading system is powered by grades (or “groups”) that you assign to something. For example, you might assign ‘1’ or ‘A’ for the top tier category, with ‘2’ or ‘B’ being the second tier category. You may decide to do this for football teams within the same league. How many different grades you decide to use is up to you. However, I recommend google searching a method known as k-clustering to identify ‘natural’ groups and reduce bias in your model.
You can grade teams on their ability, determined by past performance. For example you may look back on a ‘window’ of fixtures — such as the previous season — and grade accordingly. Teams such as Leicester (!), Arsenal, Spurs & Man City would probably be labelled a Grade ‘A’ based on the 2015-16 season. Man United could either be a grade ‘A’ or ‘B’.
So How do We Use These Graded Teams for Football Prediction?
The real advantage of grading teams on performance is that it enables us to make generalisations about ‘types’ of teams that face each other. I believe that a lot of casual football bettors use this method to select their bets without actually realising it.
If we’re accurate at assigning fair and accurate grades to teams then we’re able to produce useful statistics on how the results usually play out when teams of different abilities play one another.
To begin doing this you need to download some past data (try football-data.co.uk) and open it up in Excel. From here you’ll have to label the teams with a grading, and then produce the stats.
With some fairly simple stats you can answer historical questions on your graded teams, such as:
- How often did a Grade B team beat a Grade A team? …
- When the Grade B team was at home? …
- Over the last 3 years?
- When the Grade B team was at home? …
And so on. This last question can only be answered by increasing your data set to incorporate more than just one season. But you get the idea.
I recommend firstly generating a grid of stats for the results of every grade vs each other. For example, if you have 4 groups (let’s say A, B, C & D) then you have the following 16 fixture ‘types’ to account for.
Every potential fixture Type
|A vs A||A vs B||A vs C||A vs D|
|B vs A||B vs B||B vs C||B vs D|
|C vs A||C vs B||C vs C||C vs D|
|D vs A||D vs B||D vs C||D vs D|
Within each of the above 16 fixture types there are 3 possible results: Win, Draw or Lose. This means there are 16 x 3 = 48 total outcomes that you need to calculate percentage values for, based on historical performance.
From this point onward you can add more factors to the football prediction model in order to tweak the percentages. The idea is that adding more complexity improves the accuracy of the predictions.
It's essential that you're competent in Excel (or comparable program) in order to produce a stats-based betting model.
Using Your Statistics to Create Football Prediction Odds
If you’ve produced stats in percentage format then translating them into odds is simple. What follows goes for any sports betting model, and any sport. This isn’t unique to grading systems.
For example, you may have found that 35% of the time a Grade B team beats a grade A at Home, 20% of the time it was a draw, and 45% of the time the away team won. The most important thing to remember here is that whenever you produce percentage stats for the 3 outcomes of a football match it must add up to 100%. In this case it’s: 35% + 20% + 45% = 100%.
If you want to predict an upcoming fixture based on those percentages, then you convert the % stats into decimal odds. The formula is:
1 / (% chance of each outcome based on past data)
What we get in the above scenario is: 2.85 (H) : 5.0 (D) : 2.22 (A)
Now You’ve Got Estimated Odds. What’s Next?
Well, if you believe that your estimated odds are predicting results accurately, then you may want to use this as the basis for finding value bets. Backing at higher odds than your estimates imply would be a value bet. Laying below your estimates would also be a value bet, too.
But There are Weaknesses With Grading Systems. Here’s Some of Them…
- Runs of form: if you analyse only a small window of historical data (e.g. the last 4 matches), then you’re liable to make weak predictions based on short-lived winning/losing streaks.
- Teams of the same Grade are treated as ‘equal’: for example, some Grade A teams may in fact be superior to others in their group. Some systems make generalisations that weaken the predictions. You should avoid doing that.
- Group structures change between seasons: In one season there may be well a defined number of groups. Currently in the Premier League it’s well recognised that there’s a Top 6. But that’s not always the case.
Realistically, basic grading systems are a little too simplified to identify value in the Premier League. You can break even with a relatively simple approach, but you’d have to improve the concept, and incorporate more factors & influences for it to identify value. However, the skills required to create odds from this approach will prove invaluable to other methods listed in this post.
Rule-based betting systems can be used in conjunction with a grading system, or any other betting system for that matter. The ‘Rules’ are used in order to decide, or restrict, what bets you place. To create rules you want to look for patterns in past data.
The Power of Hindsight… Or Not?
When you analyse past data, in hindsight you’ll be able to identify a combination of ‘rules’ that would have turned you a profit if you’d placed bets on those selections. Its surprisingly easy to do this. However, I should warn you not to get excited too quickly. Things aren’t always what they seem…
Imagine you’re playing Sonic. In theory, there’s a combination of buttons that you could press at precisely the right time that will get him through a level — where Sonic’s never hit by spikes or villains, avoids falling down a hole, and doesn’t drown. The string of buttons may be outrageously complex and far-fetched, but it’s possible to work it out if you play through a level enough times. Let’s suppose it was something like “hold down Right and hit ‘A’ at precisely 3, 10, 17, 21, and 34 seconds into the level”.
This button combination may work perfectly on one level.
Try repeating that same button combination on the next level. Does Sonic get through without dying?
No, of course not — because we’ve basically manufactured one death-dodging combination which doesn’t apply to the rest of the game.
So what am I getting at?
Well, with football it’s easy enough to analyse past data and (naively) identify a pattern. It could be something like “so far this season Chelsea have won every away game where bookmakers offered more than 3.0 odds at kick-off and they drew their previous fixture”. Or “so far this season Spurs have beaten every team that lost their previous 2 home fixtures”. Even if these statements are true and that betting on those specific selections would have made you money, the question remains: have you found value?
There’s no guarantee that the ‘rules’ you’ve applied are onto a winning trend. Using very specific rules to select bets often has no advantage, and the presumed ‘trend’ doesn’t continue in the way we’d hoped. This dilemma is known as data over-fitting and it’s precisely the danger in drawing conclusions from past data.
Tips to Avoid Over-Fitting Your Data
- Don’t make your rules too strict. If you’re too specific then you’ll end up making weak assumptions from a small subset of data.
- Always ensure that you analyse a large set of data. For more information read my post: The Paramount Importance Of Sample Size In Betting Analysis
- Ask yourself: do my rules make sense? Keep an open mind, but scrutinise your rules as well. Ideally there’s logic behind them.
It’s easy to be blinkered by our own analysis — especially when it seems to show huge profits. But if you follow these 3 steps then you stand a much better chance of using ‘rules’ effectively for football prediction.
Here’s an approach that takes things up a notch. By incorporating historical data, the Poisson distribution provides a method for calculating the likely number of goals that will be scored in a football match.
The good news is that you don’t need to fully understand the Poisson Distribution concept to use it. In fact, Microsoft Excel will work out Poisson automatically. All you really need to know is that it can be used to calculate the probability of outcomes for a football match in goal-based markets such as Match Odds (1×2), Correct Score, Over / Under Match Goals, Both Teams To Score and Asian Handicap.
I’ve worked on a football prediction project involving the Poisson Distribution. I found that although it has its limitations and faults, applying Poisson is a very useful approach to understanding the fundamentals of creating your own odds. I much prefer using this method to some of the basic grading systems described earlier in this post, due to the fact that you don’t generalise by ‘grouping’ teams together.
The Basics of the Poisson Distribution for Football Prediction
Pinnacle has a useful entry-level article on how to use the Poisson Distribution here. I’ll elaborate on some of the key points.
To start off you’ll need to download historical results to calculate the average number of goals each team scores and concedes within your chosen timeframe (e.g. one season), for both home and away games. These averages are compared to the league average and used to create values for attacking strength and defensive strength for every team.
The figures for attack and defence are easily calculated by dividing Average Goals For or Average Goals Against by the league average. For example, if the average Goals For in the Premier League is 1.45 and Man City has an average of 1.97, then they are 35% above the league average for attack, meaning they’re a goal scoring threat. Here’s how that’s calculated:
1.97 / 1.45 = 1.35
1.35 = 135%
135% – 100% = 35% above average.
These metrics, including the opponent’s, is put into a Poisson Distribution formula. This works out the probability of every result when two teams face each other. These % probabilities can be converted to odds using the method I showed earlier in this post, and then used to identify where there is value at a Bookmaker or exchange.
Whilst this method is likely to produce fairly accurate football predictions, you shouldn’t assume that other people aren’t using it already — because they are. Collectively the market incorporates all the people using this approach and thousands of other methods — no matter how simplistic or complex. Therefore this distribution can only really be seen as the basis of your model.
Again, I encourage you to read the Pinnacle article to learn more about the full calculations to gain a full understanding.
How Many Games Can We Use to Calculate the Goal Expectation Figures?
You need to experiment with this for yourself. Consider that teams such as Leicester have varied so greatly that a large window of, let’s say, 5 seasons may not produce stats that are truly representative of them right now. Also, a very small window of games (e.g. the past 3 fixtures) doesn’t provide you with much data to work with. It’s a tough one to call, but in my experience, from around 10 games into the new season you have at least something current to work with.
Weaknesses In the Poisson Distribution for Football Prediction
Like most stats-based approaches to betting, this only considers the (measurable) results. But we’ve all seen plenty of games where a team dominated a match but failed to score. Or where the dominant team even lost the match via an unexpected goal e.g. a late penalty. Match results tell us the final score, but do not tell us what actually happened during the game.
Another weakness is that it is also believed that the probability of draws and the probability of zero is underestimated when using Poisson Distribution for football prediction. This can however be rectified using a method known as zero-inflation to increase the probability of no goals.
Combining Expected Goals (xG) Data With The Poission Distribution
Poisson could be vastly improved using a more sophisticated statistic, known as Expected Goals (xG). Expected Goal (xG) stats quantify attempts on goal. This cuts through the sentiment and evaluates performances from a scientific standpoint. Using it in your football betting model improves your accuracy and maximises your expected value (EV).
Do Any Of These Football Prediction Methods Actually Work?
I’ve focused on the weaknesses of these stat-based football prediction approaches. This is because I’ve learnt that the top flight football betting markets are particularly difficult to find consistent value from. I’m basing this on my own findings, through various experimental projects, over a lot of data.
There’s a Lot to Consider With Football Prediction…
Compare football to other sports — like horse racing — where past stats are far more relevant to an upcoming event. Firstly, the horse is the same (albeit a bit older than it’s previous race). Weather aside, the tracks remain the same. Most of the time the jockeys and trainers are the same, too. It’s consistent.
But even more consistent than horse racing are single-person sports, like darts or bowling, where there’s only 2 outcomes to a match, and opponents never physically impact one another. Darts players effectively play the same board that they’ve always played — often for decades on end — in every single fixture.
Now consider football, where no two leagues — or even seasons — are alike. The squad, the first 11 team players, the managers & coaches, and even stadiums frequently change. Then there’s injuries, player bans, relegations & promotions, and transfers to account for. How can our betting models possibly keep up?
It’s Complex, But There Is Hope…
Just because there are complexities in football doesn’t mean to say that stats-based approaches can’t work for you. Note that football betting offers an enormous array of leagues and markets across bookmakers & the betting exchanges. So you can be highly selective without compromising on turnover.
Football also has more interference, more hype, and more noise surrounding the game than any other sport. There’s countless variables that influence the odds. As a result, the markets often neglect what really has an impact on the game itself. And this is precisely why some stats-based models thrive.
Remember: statistics aren’t influenced by gossip from pundits, tabloids or the morons on Twitter!
So What Really Influences Football?
That’s a good question. I have some views on this, which I’ve shared in my posts:
The ‘Perfect’ Strategy
I admit that I have always leaned towards using ‘cold’ market-based approaches to sports betting. I’m a strong believer that the market gets the odds right (on average) by compiling the opinions of thousands of others. Therefore I am much more inclined to allow others to create the odds by applying methods such as the ones outlined in this article. Instead I’d look to predict the direction of the prices in the market, and try to grab a good, early price.
The ‘perfect’ betting model, in my opinion, is one that’s able to respond quickly to news. It needs to distinguish the hype from what’s really having an impact.
While this is challenging from a programming perspective, there are trading platforms capable of doing this in the financial markets. Who knows, perhaps it’s already been developed for sports…
Top Tips For Premier League Football Betting
Drifters & Steamers — The Risers & Fallers Of Betting Markets
Making Accurate Football Betting Predictions Is Difficult
- Accumulator vs Bet Builder vs Request A Bet — What’s The Difference? - March 28, 2023
- Best Sites For Free Tennis Statistics | Top Tennis Stats Websites - March 28, 2023
- Variance In Sports Betting | Understanding Swings In Results - March 28, 2023
I’m not too sure that i’d ever bother trying anything with the EPL. I’d want to dive straight into some obscure foreign league/market. Much more chance of finding decent odds there.
Sounds like a good approach. However, sometimes those obscure markets lack liquidity and the participants tend to be people like you i.e. looking for value opportunities as opposed to a punt. Swings and roundabouts. Striking the right balance is important.
I would like to suggest a combined strategy between the Grading System and the Poisson Distribution.
First of all, I would use the Total Shots in the Box Ratio (TSBR) to grade a football team.
Since about 85% of goals come from shots in the box, it helps to grade each team based on their ability to make and avoid shots in the box.
In this way, you can assign a group for each team according to the TSBR and get the results usually play out when teams of different groups play one another.
Then, you can put a result for a specific fixture type in a probability calculator based on Poisson distribution, to get the “fair odds” for that match.
From here, just like the article says, you can back at higher odds than your estimates and lay below your estimates.
Finally, another weakness I see concerns the probability of a draw, given that the Poisson distribution calculates it with a slight imprecision.
However, it comes to help the data set available for each potential fixture type to calculate the percentage of draws.
How many draws happened between the Group A team and the Group B team?
Especially when the sample size is very large and you have at least 150 results for a fixture type.
I completely missed your comment here, so I’m a year late to respond.
But thank you for your input – you’ve given it a lot a thought and I’m sure it’ll help someone out.
TSBR data could be easier to obtain than XG. Plus there’s no ambiguity about it. XG would also differ depending on how it was calculated (i.e. how chances are assessed).
However, using TSBR would lose some accuracy by assuming all shots in the box are equal. In reality a shot from two yards stands a much better chance than an ambitious (or simply weak) effort from far edge of the box that would require a lot more skill to convert.
These kinds of things are well worth trying though, if only to compare results from different approaches.
Please check my comment below about the Orio Sports course.
how are You?
I would like to buy Orio Sports course, but, their website is down.
I have no idea whether they are still offering the course.
On the top of that, what is the best way to get xG data? Buying course won’t be useful if I cannot get xG data.