Edit:There is another version of this article available in pdf which includes more explicit mathematical formulas and an example worked in gruesome detail.
We all know that some games are easier to play than others, and we all make adjustments in our head and in our arguments that make reference to these ideas. Three points out of a possible six on that Californian road-trip are good, considering how good those teams are; putting up 51% possession numbers against Buffalo or Toronto or Ottawa or Colorado just isn’t that impressive considering how those teams normally drive play, or, err, don’t.
These conversations only intensify as the playoffs roll around — really, how good are the Penguins, who put up big numbers in the “obviously” weaker East, compared to Chicago, who are routinely near the top of the “much harder” western conference? How can we compare Pacific teams, of which all save Calgary have respectable possession numbers, with Atlantic teams, who play lots of games against the two weak Ontario teams and the extremely weak Sabres?
Intuitively, we know that such adjustments are necessary. For instance, if our favourite team repeatedly puts up 50% possession numbers against teams that we somehow know are 60% possession teams, it would be idiotic to say that our team is “A 50% team”, they’re clearly holding their own against 60% teams and it’s much more credible to say that our favourite team is also a 60% team. Various people have developed schemes for adjusting win-loss records, as well as point systems that are explicitly geared at prediction, such as RPI, or various ELO-style schemes, or ZRatings.
I am very fond of predictive statistics, as connoisseurs of my previous
articles will doubtless be familiar, but I view such adjustments as primarily descriptive. I want to be able to look at a length of time and say “during this time, my favourite team played like a 50% team“, for instance. In this way I hope to escape being deceived by strong results over weak opponents or by weak results against strong opponents. (As it happens, I find that schedule-adjusted measures are, as one might expect, more predictive than the corresponding “raw” measures. But that is another matter, for another day.)
In this article, I focus on adjusting possession statistics for strength-of-schedule. For reasons that will be fully explained in future articles, I intensely dislike measuring possession in percentages, and prefer to maintain separate tallies of offence and defence. The method I present here takes raw measurements of possession, expressed as counts (possibly adjusted for other effects, such as score effects, home/road effects, time-of-game effects, or rink effects), and produces schedule-adjusted counts.
First, fix a set of games to consider. It is not important that the number of games played by each team be the same. Next, form the matrix of raw events that are of interest. In our case, we’ll use corsi events, but the adjustment method works for anything that can be counted. Specifically, form the 30-by-30 matrix whose (i,j) entry is the average number of events obtained by team i against team j in the games under consideration.
We want to reward teams that do “better than expected [against a given team]” and punish teams that do “worse than expected [against a given team]”. As a starting point, we compute the average number of events per game in the set of games under consideration, that is, the sum of all the matrix entries, divided by the two times the total number of games; we call this number m. In the set of games being considered, teams allowing fewer than m events are ‘strong’ defensively; teams generating more than m events are ‘strong’ offensively, and so on.
We want to use this matrix S, which will not change, to estimate the relative offensive and defensive ability of all the teams. We begin by implicitly assuming that they are all equally good at offense and defense, we will discover the truth in time.
To make our first adjustment, we first add up all of the events generated by a given team; that is, add up all of the entries in a given row. This produces a vector whose entries are the total number of events for each team over the sample at hand. We divide the entries in this vector by the number of games played against each team; and finally we divide the vector by m, the average number of events generated by a team in the sample. This gives a vector whose entries are all the relative offensive strength of each team, compared to average. An entry of 1.10 represents a team generating, on average, 10% more than average events, and an entry of 0.85 represents a team generating, on average, 15% fewer than average events. This is our first estimate of offensive strength. Let us call this vector of offensive strengths F1.
Similarly, we make a calculation of defensive strength by summing the columns of S, that is, the shots allowed by a given team. Dividing the entries of this vector by the number of games played between the relevant teams, and then dividing by the average number of events m gives a vector whose entries are a measure of each team’s defensive ability; an entry of 1.10 is a team that is permitting 10% more events than average and an entry of 0.85 represents a team that is permitting
15% fewer events than average. Let us call this vector defensive strengths A1.
We can use these vectors (F1,A1) to further refine our estimates of strength. Now that we have an idea of the strengths of the teams, we can re-interpret their event totals in that light. Form the weighted sum of each row in S, where each weight is the inverse of the corresponding entry in A1 divided as before by the number of games played against that opponent and the average stat m. When entries of A1 are high, that is, when considering games against teams that give up lots of shots, the effective sum is decreased. When entries of A1 are low, that is, when considering games against stingy defensive teams, the effective sum is increased. Call this vector of adjusted offensive tallies F2.
To illustrate, let’s examine an imagined fragment of this calculation.
Suppose that we have two teams, Y and Z, and we have that team X has 40 events against team Y and 20 events against team Z. If the entry for team Y in A_1 is 1.1 and the entry for team Z in A_1 is 0.9, then this calculation would produce a total of 40/1.1 + 20/0.9 = 36.4 + 22.2 = 58.6 events. The 40 events against team Y are counted as 36.4 since team Y gives up a lot of events (10% more than average), but the 20 events against team Z are counted as 22.2, since team Z is very stingy (giving up 15% fewer events than average). In our imaginary example there is very little change, since team X put up fairly ordinary numbers against varied opposition.
Similarly, to adjust our measure of defensive ability, form the weighted sum of all the entries in a column of S, where the weights are the inverses of the vector of strengths, F1, computed previously, divided by the number of games against each opponent, divided by the average statistic m. Call this vector A2, our refined vector of coefficients of defensive strength.
So, we have a way of taking estimates (Fi,Ai) of offensive and defensive strength, and turning them into better measures (Fi+1,Ai+1). We repeat this process many, many times (a hundred times is usually enough, computers help here) until the output vectors are imperceptibly different from the input vectors, and we consider these vectors (F,A) to be the true schedule-adjusted offensive and defensive skill of the teams as shown over the games in question.
Once we have the vector F, describing offensive ability (relative to the sample at hand), and the vector A, describing defensive ability, it is simple to compute the schedule-adjusted statistic–simply multiply each element of F by the average statistic m to obtain the offensive schedule-adjusted statistic, and multiply each element of A
by m to obtain the defensive schedule-adjusted statistic.
Of course, early in the season, the matrix S will have a lot of zeros in it, since many teams will not have played one another. Even now, around a quarter of the way through the season, many of the entries are zero. When “too many” of the entries of S are zero (where the technical sense of ‘too many’ is beyond the scope of this article), the process I describe above will not settle gently on one value, but instead behave chaotically. In this case, there is no schedule-adjustment. To make adjustments, one has to guage the strength of one’s opponents, and to do so requires using their opponents, which are evaluated using their opponents, and so on, and if there are insufficiently many games played, then there is no information with
which to perform these evaluations.
Those cursed souls who follow me on twitter will already have seen the results of these schedule-adjustments, since I tweet plots like the below every day. This example shows 5v5 score-and-venue adjusted corsi, in black, and schedule-adjusted score-and-venue adjusted corsi, in red. Buffalo is so dire (by both measures) that I have omitted them from the chart to permit the schedule-adjustments to be seen.
Most teams are more or less unchanged by the adjustment, which makes sense—most teams have played a representative set of opponents and put up ordinary numbers. However, several notable things jump out all the same. Dallas, for instance, has a much better offense than appears from their raw counts, since they’ve been putting up decent numbers against, on average, very stingy teams. Their defensive numbers also improve somewhat. Toronto has been putting up offense as expected but concede more events than they should considering their weak schedule
so far. Boston is substantially inflated by weak opposition, Winnipeg the opposite.
Another infuriating consequence of the lockout, in case we were somehow short of such, is that the shortened schedule contained no regular-season inter-conference play. This means that it is only possible to perform schedule-adjustments within conference. How vexing.
Unnecessary Mathematical Uncertainties
The process of forming (Fi+1,Ai+1) from (Fi,Ai) is an endomorphism of Rn x Rn, which I call Sadj. Performing the adjustment amounts to computing a fixed point of Sadj. As mentioned above, when S is sparse, Sadj may not have a fixed point in the component of Rn x Rn containing 1 x 1; ideally we should be able to characterize the fixed points of Sadj using the properties of S. The function Sadj strongly resembles a Markov process, for which fixed points are well-understood, but I cannot seem to rewrite it as such. The main obstacle is the inversion of the coefficients at every step, which does not play nicely with matrix multiplication. Even Brouwer’s fixed point theorem does not appear to apply. Alas.