Everybody has his interests. One of mine is to try to understand how much a player is worth, both in analytic and economic terms. Baseball allows that pretty swiftly: decades of research have allowed us to have a better (but by no means definitive) understanding of how runs are created and saved and how games are won. Payrolls are relatively transparent and this also allows us to convert a player’s value in economic production.

In soccer (or football, but I’ll stick to the American term) financials aren’t public and pervasive, so the economic production is beyond any sort of definition (from the outside, at least), however we can start making some attempts to define a player’s production. The release of the data from MCFC Analytics should help. I should mention that there is a huge amount of stuff that can be gathered even from their basic spreadsheet and we can search for fun factoids, for correlations between different events in order to ultimately try to understand what wins and loses games, or a lot more.

What I would love to do is to develop an initial framework for player valuations. There is a huge disclaimer on the fact that I’m working with the small sample size of only 1 year of data, plus other limitations that we will discuss once the project is complete, however we’re talking about a basic framework and we should be able to implement that.

Without wasting further time, let’s get started. In baseball one of the ways in which a player’s production can be measured is in WAR (Wins Above Replacement). Without caring for individual implementations, the concept is that a player’s contributions on all sides of the ball are summed up and stacked against the contributions of the “replacement player” (defined as freely available talent who can be obtained at virtually no cost and any time – this figure can be calculated experimentally). Those contributions are measured in Runs, which are the primary currency of baseball games, and those Runs are later converted into Wins. It stands to logic that we could or should measure a soccer player’s value in Goals contributed above and below a threshold. It is impossible for us to derive quantitatively what a replacement player is worth in soccer, since we don’t have any data for minor leagues (such as the Championship), so we can either set it artificially, which I don’t like, or measure a player’s contributions against the average English Premier League (EPL) player, a more elegant solution in my opinion. The conversion from Goals to Wins (or Points) isn’t going to be as straightforward as in baseball, due to the existence of draws (or ties) in soccer and also due to the non-linear assignment of points: a goal that ties a game and a goal that wins it are both extremely important, but a win will assign 2 points above a draw, while a draw will only assign 1 point above a loss. In baseball you either win or lose. This doesn’t mean that the conversion can’t be done, but it’s currently beyond the scope of our framework.

Anyway, in baseball there is a strong correlation between Team Wins (or Winning%) and Run Differential. Logic tells us that the same would happen in pretty much every sport, including soccer. Just to make sure, I ran a correlation between some basic team stats and points in the standings at the end of the year. I had done this previously for the Italian Serie A in 2007/2008, but the results for the 2011/2012 edition of the EPL are very similar. A correlation of 1.00 is perfect, a correlation of -1.00 is also perfect, but inversely. Anything close to 0.00 indicates no correlation at all. In parentheses you’ll see the r-value for the above mentioned Serie A study (the linked post, for various reasons, presents r-squared values instead of r).

– Overall Goals scored: 0.914 (0.933)

– Overall Goals allowed: -0.835 (-0.843)

– Overall Goal difference: 0.978 (0.964)

– Home Goals scored: 0.912 (0.894)

– Home Goals allowed: -0.704 (-0.424)

– Home Goal difference: 0.950 (0.900)

– Away Goals scored: 0.753 (0.800)

– Away Goals allowed: -0.808 (-0.843)

– Away goal difference: 0.922 (0.917)

Some obviousness and some surprises. Obviously overall goal difference has the biggest correlation. This makes sense: in order to gain points you need to win, and in order to win you have to score more than you allow; do that consistently and you’ll climb the standings.

Home goal difference is a bigger predictor of points than away goal difference. This is surprising, in my opinion: while it is said that defense wins championships (a common saying to virtually all team sports), it’s probably only true in a limited way, if at all. The simple fact is that the best teams tend to score a lot, especially at home, and every now and then they will run up the score, while weak teams won’t be able to win 3-0 consistently. Once we look only at goals scored or allowed, we find that goals scored is a better predictor. In fact, “home goals scored” takes the cake, well ahead of “away goals allowed”, which is in second place. It surely does seem like top teams play high scoring games at home and lower scoring games when away (this finding is also valid for the Italian Serie A). They score a lot at home and concede little or nothing away. The lowest correlations are achieved by “home goals allowed” and “away goals scored”. Mind you: these correlations are still pretty significant, but vastly lower than “home goals scored”. There is more than a way to skin a cat, but teams that are incapable of scoring consistently at home will always face a steeper climb. Defense may win titles according to conventional thinking, but allowing goals at home seems to be less harmful than expected. Firepower is fundamental.

In Italy there was even less correlation between goals allowed at home and the standings, but in general the findings are eerily similar, despite being 4 years earlier in a different league. I just printed those results out of curiosity, but we’ll live with the EPL ones from here on.

Regardless, the most important overall factor is Overall Goal Difference. Generating a positive goal difference is fundamental. It sounds logical to try to determine the value of individual players in terms of something that we will call “Goals Above Average” (hopefully the name is self-explanatory), very similar to the framework that works so well in baseball. I believe that other sports such as hockey and basketball also use forms of “Adjusted Plus Minus” type of stats to achieve the same result. How will we do this in soccer? Well, we’ll start talking about this tomorrow.