Accéder au contenu principal

Goals created and Wins created

Like many others, I have the MCFC data set. I used it to toy with data and created two new statistics. In fact three, I'll explain later.

The main thing I tried to do is to assess the impact of a player on the season in terms of scoring opportunities and wins. So first of all, I did some computation, and selected the data I needed:
 - Appearances
 - Assists
 - Big Chances
 - Goals
 - Key Passes
 - Time Played
 - Shots

I have to say that I'm not too happy with the data set. Many events are not recorded with a name, so you have data that cannot be used when you look at player production. But I managed to do what I wanted with the data. All data are from the MCFC data set, I rely on them for accuracy.


First thing I looked at: topscorers.


Rank
Player full name
Goals
1
Robin van Persie
30
2
Wayne Rooney
27
3
Sergio Agüero
23
4
Clint Dempsey
17
5
Emmanuel Adebayor
17
6
Demba Ba
16
7
Grant Holt
15
8
Edin Dzeko
14
9
Papiss Demba Cissé
13
10
Mario Balotelli
13
11
Danny Graham
12
12
Steven Fletcher
12
13
Luis Suárez
11
14
Rafael van der Vaart
11
15
Frank Lampard
11
16
Daniel Sturridge
11
17
Jermain Defoe
11
18
Peter Crouch
10
19
Javier Hernández
10
20
Peter Odemwingie
10
21
Gareth Bale
9
22
Danny Welbeck
9
23
Steve Morison
9
24
Nikica Jelavic
9
25
Darren Bent
9



But this table doesn't really help. Van Persie scored 30 goals but what if all 30 where scored home when Arsenal already lead 3-0? Remember the data are for the 2011/2012 season, so he's still at Arsenal. It doesn't give us an idea on how much this contribued to Arsenal's succes. Peter Crouch scored 10 goals, but maybe 10 winners. So my idea was to calculate how many goals each player created and how much wins were derived from these goals.

I created a new statistic called Goals Created. The formula is simple but yet comprehensive :

                Goals Created = (Key Passes + Assists) * Chance Conversion Rate

In fact it is just Chances created * Chance conversion rate. This last value is calculated by dividing each team's total chances by the number of goals scored. This is a kind of efficiency rating. So now the top 25 goals creators:


Rank
Player full name
Goals created
1
David Silva
17
2
Robin van Persie
14
3
Juan Mata
13
4
Samir Nasri
13
5
Luka Modric
12
6
Gareth Bale
10
7
Morten Gamst Pedersen
10
8
Stéphane Sessegnon
10
9
Mikel Arteta
9
10
Rafael van der Vaart
9
11
Yohan Cabaye
9
12
Sergio Agüero
9
13
Wayne Rooney
9
14
Leighton Baines
9
15
Danny Murphy
9
16
Aaron Ramsey
8
17
Martin Petrov
8
18
Ryan Giggs
8
19
Matthew Jarvis
8
20
Emmanuel Adebayor
8
21
Ashley Young
7
22
Frank Lampard
7
23
Patrice Evra
7
24
Joey Barton
7
25
Bobby Zamora
7


Well, the picture is changing. Robin van Persi stays high, but Peter Crouch dissapears. David Silva is the leader with 17 goals created. Stéphane Sessegnon and Morten Gamst Pederson both with 10 goals created are not from top teams but are better than Wayne Rooney or Frank Lampard. I think this quite interesting, but we can go further.

By a simple formula I created, you can convert a player contribution in wins.

                  Win Created = (Goals + Goals Created) * Goal in Win Rate

The Goal in Win Rate, is simply the weight of a goal in terms of win. This helps correct for high scoring teams by reducing the weight of each goal. For teams with few wins, each goal has a higher value than teams with many wins and high scoring. 
I corrected this value to each teams real win numbers. For example for each you have 50 Wins Created when the team only won 20 games. So I kind of normalized the data, so it keeps a real meaning. I called this the Corrected Wins Created.
I think one can keep the Wins Created, because it favors Attackers and Midfielder. I'm working on the same of stats for Defenders and Keepers, and the results are likely to be negative. So when adding each players Win Created, you should find the team's real number of win. Allright enough talking, now the top 25 wins creator:


Rank
Player full name
 Corrected Wins created
1
Wayne Rooney
8,4
2
Robin van Persie
7,9
3
Sergio Agüero
6,2
4
Clint Dempsey
5,5
5
Demba Ba
4,9
6
Emmanuel Adebayor
4,6
7
David Silva
4,4
8
Luis Suárez
4,1
9
Stéphane Sessegnon
4,0
10
Juan Mata
3,8
11
Rafael van der Vaart
3,8
12
Edin Dzeko
3,7
13
Papiss Demba Cissé
3,7
14
Frank Lampard
3,6
15
Gareth Bale
3,6
16
Grant Holt
3,5
17
Danny Welbeck
3,4
18
Samir Nasri
3,4
19
Yohan Cabaye
3,4
20
Peter Crouch
3,2
21
Javier Hernández
3,2
22
Mario Balotelli
3,2
23
Ashley Young
3,1
24
Nicklas Bendtner
3,0
25
Mikel Arteta
3,0



Not surprisingly, Robin van Persie is still on the top spots with 7.9 Wins Creates. But the #1 is Wayne Rooney with 8.4 Wins Created. Stéphane Sessegnon keeps a nice 9th place with 4.0 Wins Created. Demba Ba, the former Newcastle Striker has 4.9 Wins Created. David Silva, our creator has "only" 4.4 Wins Created. Clint Dempsey with 5.5 Wins Created will certainly make Fulham weaker this year because of his transfer. Mario Balotelli, despite being... Mario Balotelli has 3.2 Wins Created. Not bad when you look at his behaviour.

With all the data I collected, I created a third variable, Goal Situations Created. As I'm still working on improvements on the model, I will not give details. I based my work off the work of Bill James' Run Created formula. The results for now:


Rank
Player full name
Goal situations created
1
Robin van Persie
46
2
Wayne Rooney
37
3
Sergio Agüero
35
4
Emmanuel Adebayor
33
5
Clint Dempsey
26
6
Demba Ba
21
7
Edin Dzeko
21
8
Grant Holt
20
9
Luis Suárez
20
10
Danny Graham
19
11
Mario Balotelli
19
12
Rafael van der Vaart
18
13
Papiss Demba Cissé
18
14
Gareth Bale
17
15
Danny Welbeck
17
16
Daniel Sturridge
17
17
Frank Lampard
16
18
David Silva
16
19
Theo Walcott
16
20
Steven Fletcher
16
21
Javier Hernández
16
22
Peter Crouch
15
23
Peter Odemwingie
15
24
Jermain Defoe
15
25
Darren Bent
15


To conclude, I want to say that I enjoyed toying with the data and found some interesting stuff. I created two new statistics, Goals Created and Wins Created by using relatively easily accessible data. Unlike american sports, football (or soccer for my American readers) has only a few publicly available.

Commentaires

Posts les plus consultés de ce blog

Win Probability Over Time It is often said that in football a two-goal lead will always yield a victory. But is that really true, and if yes to what extent? Well, this is the aim of my study, thanks to seven seasons of data. What we will basically do is analyze every possible situation, from trailing by 6 to leading by 6. Of course the timing will play a huge role. Because my data set is relatively small, I decided to regroup every result in 5 minutes brackets. Then this will give us the possibility to know the value of a goal in terms of winning chance. Of course in football there is the possibility of a draw so I will have two curves for each game situation. The first one with winning and draw percentage and the second with the simple formula: This will give us a good idea on a team chance of securing the three points. So with this second graph, a basic trend curve will allow us to calculate the Winning% at every point ...

Win Probability Over Time (Final)

After crunching the number and the data, I came up with nice equations for each score. This allow me to create a table showing the Win probability at every minte for a score differential of -3 to 3. But the two extreme cases are not as accurate as the others. One can check the R² later in the post. So here is the final graph. Trailing by 1 Trailing by 2 Trailing by 3 Tie Leading by 1 Leading by 2 Leading by 3 R² = 0,7074 R² = 0,71 R² = 0,1234 R² = 0,2581 R² = 0,9007 R² = 0,8253 R² = 0,2019 So from these I derived the marginal winning chance provided by a goal. I only used the goal differential -2 to 2 because I want to keep the most accurate models. On the graph, T means trailing and l means Leading. So for example the graph T2-->T1 means the winnig probability added by going from a 2 goal deficit to a 1 goal deficit. If m...

One-goal and two-goals lead analysis

Continuing on my football resultss analysis, I want to analyze 1-goal and 2-goals lead. The main question is : does every score yields the same winning chance? Of course we keep scores with the same differential, for example home team is one goal up. So I ploted on the same graph the Win% for 1-0, 2-1 and 3-2. First thing you will see in the graph below is that 2-1 data starts at the 15th minute and the 3-2 data starts at the 45th minute. All data points are regrouped in 5-minutes window to correct for the sample size. What you see is that basically, Win% increases over time. But something more interesting is the trend. It seems that when the score is 2-1, the home team has less of winning (or to draw, as I count a draw as half a win). The difference is slighlty over 5%. When the score is 3-2, the curve is also below the 1-0 curve for most points. When building nice fitting curve for 1-0 and 2-1, the trebd is clearer. Even if the effect is not huge, it is interesting to no...