Accéder au contenu principal
Win Probability Over Time

It is often said that in football a two-goal lead will always yield a victory. But is that really true, and if yes to what extent? Well, this is the aim of my study, thanks to seven seasons of data.
What we will basically do is analyze every possible situation, from trailing by 6 to leading by 6. Of course the timing will play a huge role. Because my data set is relatively small, I decided to regroup every result in 5 minutes brackets. Then this will give us the possibility to know the value of a goal in terms of winning chance. Of course in football there is the possibility of a draw so I will have two curves for each game situation. The first one with winning and draw percentage and the second with the simple formula:

This will give us a good idea on a team chance of securing the three points. So with this second graph, a basic trend curve will allow us to calculate the Winning% at every point of a game. Using the set of equations given by all the possible outcome, we will determine the MarginalWinning%. This means the added amount of winning percentage given by a goal scored.
Let us start our journey.

First part : Home team is trailing by 1

I’m a fan of Liverpool, I admit it and I don’t like the idea of losing. So whenever the away team scores first at Anfield, my mouth is full of insults for the player who scored or the damn referee who stole us! But my head is still active even after a few pints, and says something like, don’t worry Suarez is on the pitch. If he doesn’t dive he’ll sure score twice. In fact he does score on a generous penalty. But after 90 minutes of a poor football, my team loses. But then everybody says that they were close to win this game.  Me, statistics enthusiast, I want to know by how much we missed the comeback.

Well over the 7 seasons of data I gathered, 4 from French Ligue 1 and 3 from English Premier League, there is 1428 cases of team trailing by 1. Keep in mind that in a 3-2 game, two occurrences can appear. If the score line goes like : 0-1, 1-1, 1-2, 2-2, 3-2 there will be two times the home team was trailing. Obviously these events happened at two different points of the game, so different minutes and odds. So again, data are in 5 minutes brackets, to have nicer curves and correct for the lack of data. The winning chance and draw chances can be found on the next graph.

The trend for winning is really clear, and obvious. As time flies, chances of winning are reducing. We can spot an odd point, for [16-20] minutes brackets. At first I thought maybe I made a mistake while constructing the graph, but I double checked and still I couldn’t explain this. So I took a look at the data and what a surprise! Of the 105 games that fit the conditions only 5 managed to win. 27 came home with one point and the other 73 lost. So for the moment I have no explanation on this sudden drop. Maybe a psychological effect, but how can one explain that a team would have more chance of winning at the minute 21 rather than 19? Well as I have no reasonable explanation, I can only make hypothesis. My assumption is that the manager is aware of this drop, around half of the first half of the game, so a quarter. As the manager knows that, he might give some instructions to change tactics or something.  This could explain the peak for the next bracket of time. Also the same phenomena appears around the eightieth minute. This is the money time, as referred in many American sports. I need to dig into that and if the odd point still appears in other conditions. 

An interesting thing to note is the almost constant value around half time. Fatigue could explain this, and the fact that there is an observation round at the start of each half. My guess is a 5 to 10 minutes window.

As for draw, the trend is also a reducing percentage as time goes by. The curve is not really nice, but still betwenn 15% and 30% almost all the time.
Now if we look at the Winning%.


The trend is once again clear, and we still have the odd 20 minute point. One might call it the twentieth minute dilemma. Should the manager try to do something or leave this to luck?.
The curve is not quite fitting the data, R² is only about 0.70 but it will be useful for further analysis.
I know these kinds of analysis already exist throughout the web, but it’s a good starting point to build my own statistics. It’s also a good way to test your model against others. Still, I think this is a good base to be able to put a number on a goal. It helps us compare Van Persie to a less prolific striker. But that’s another story I hope to tell you about later. 
 
 So I guess that’s it for this first short analysis. Feel free to ask any question, comments…  As I’m not at an advanced level of English I apologize for all the mistakes. I also want to point out that as an undergraduate student I don’t have access to high level tool I saw on some articles. To finish, I want to say that my Excel skills are slightly above average, but I don’t anything about VBA, so don’t expect fancy things. I hope to be back soon.


Commentaires

Posts les plus consultés de ce blog

Goals created and Wins created

Like many others, I have the MCFC data set. I used it to toy with data and created two new statistics. In fact three, I'll explain later. The main thing I tried to do is to assess the impact of a player on the season in terms of scoring opportunities and wins. So first of all, I did some computation, and selected the data I needed:  - Appearances  - Assists  - Big Chances  - Goals  - Key Passes  - Time Played  - Shots I have to say that I'm not too happy with the data set. Many events are not recorded with a name, so you have data that cannot be used when you look at player production. But I managed to do what I wanted with the data. All data are from the MCFC data set, I rely on them for accuracy. First thing I looked at: topscorers. Rank Player full name Goals 1 Robin van Persie 30 2 Wayne Rooney 27 3 Sergio Agüero 23 4 Clint

One-goal and two-goals lead analysis

Continuing on my football resultss analysis, I want to analyze 1-goal and 2-goals lead. The main question is : does every score yields the same winning chance? Of course we keep scores with the same differential, for example home team is one goal up. So I ploted on the same graph the Win% for 1-0, 2-1 and 3-2. First thing you will see in the graph below is that 2-1 data starts at the 15th minute and the 3-2 data starts at the 45th minute. All data points are regrouped in 5-minutes window to correct for the sample size. What you see is that basically, Win% increases over time. But something more interesting is the trend. It seems that when the score is 2-1, the home team has less of winning (or to draw, as I count a draw as half a win). The difference is slighlty over 5%. When the score is 3-2, the curve is also below the 1-0 curve for most points. When building nice fitting curve for 1-0 and 2-1, the trebd is clearer. Even if the effect is not huge, it is interesting to no