Introducing Expected Plus-Minus

This is Part 2 of a 5 part series detailing the my WAR model, Part 1 of the series can be found here.

Introduction

“Basically, anything that WOWY can do, I think can be done better with regression-type methods”Andrew C. Thomas, Lead Hockey Researcher Minnesota Wild

Adjusted Plus-Minus metrics were first introduced into NBA circles around 2004 by Dan Rosenbaum. The basketball community has since seen many iterations including; Steve Ilardi/Aaron Barzilai, Joseph Sill and Jeremias Engelmann. Soon after these metrics made their debut into the public sphere they were adopted for hockey and have themselves seen many different iterations; Schuckers/D.Lock/Wells/Knickerbocker/R.Lock, Brian Macdonald, Gramacy/Jensen/Taddy, Thomas/Ventura/Jensen/Ma and Emmanuel Perry. I even made my own attempt in the summer of 2015 which I coined Corsi Plus-Minus. These metrics have struggled to take hold amongst the hockey community for whatever reason, unlike in basketball circles.

Regression Example

This model is a version of a multivariable regression. For those of you that might not be familiar with regressions here is a relatively simple example to help you grasp the concept.

y = β+ β1X+ β2X+ β3X3

• y – Predictor Variable – How well you will do on your test? Measured in points
• x – Explanatory Variables
• X1 – How long did you study? Measured in hours
• X2 – How long did you play video games? Measured in hours
• X3 – Did you go to the study session? Yes/No?
• This is an example of a dummy variable
• B – Coefficients – The value of each Explanatory variable

So if you had a sheet of data filled out with how well everyone did on their test (dependent variable) along with the information on the 3 independent variables and then ran that data as a multivariate regression you would get something that looks like this (reminder I made all these numbers up):

y = 75 + 2X1 – 3X2 + 6X3

For the purposes of my methodology, the regression coefficients are what we will be focusing on. Looking at coefficient β1, for every extra hour you studied (holding all other independent variables the equal) you can expect to score 2 points better on the test. Looking at the dummy variable, X3, it can either be a 1 (yes, you did go to the study session) or 0 (no, you did not go to the study session). If you go to the study session (holding all other independent variables the equal), you can expect to do 6 points better on your test.

Methodology

Using the play-by-play data provided by NHL.com I was able to look at every even-strength on-ice event that took place from 2007-2016. From this data we set up a regression where our variables are:

• Response Variable
• Explanatory Variables
• All players on the ice
• The coaches of both teams
• Zone the shift started in
• If one of the teams is playing a back-to-back

Now back to how this relates to my methodology. Picture every player as a dummy variable in the dataset (that player was either on the ice during a shift or they weren’t). Including all players in our regression allows us to simultaneously account for the impact of both teammates and opponents. Our regression will then give us coefficients that tell us (holding all other factors equal) how much of an impact Player X has on his team’s Expected Goals, a value I have coined Expected Plus-Minus (XPM).

However, due to the collinear nature of our data, as was touched upon earlier, we would be better served to use a ridge regression (also known as Tikhonov Regularization) to help account for it. This method adds a penalty factor to the regression for results being far away from the mean. This penalty factor, called lambda, is chosen based on a 10-fold cross-validation. This helps remove a lot of the noise accompanied with such a process.

Players with minimal amounts of time-on-ice will only add more noise to the regression, therefore only players deemed NHL regulars – ranking in the top 390 forwards and 210 defensemen in even-strength time on ice (approximately 13 forwards and 7 defensemen per team) – will be kept in the regression as individual players. All players who do not meet that threshold will be grouped together as replacement players. While there will be some talented players who fall into this group (ex. due to injuries), the vast majority of players falling under this umbrella would be most accurately described as replaceable (fringe AHLers). This replacement level group provides value by stabilizing our regression and allowing us to compare player performance against the performance of the replacement level group. That’s how we move from ranking players as “above-average” to “above-replacement.” The significance of baselining players to replacement level will be revisited later.

Adding a coaching variable allows us to capture team effects and works the same way as the player variables. The regression will then provide coefficients that explain (holding all other factors equal) how much of an impact Coach X has on his team’s Expected Goals. The decision to capture team effects through coaches instead of the team itself was due to the lower cross-validation error when using coaches compared to teams. Using both team and coaches as variables seems to only make matters worse as the model has trouble deciding how to distribute credit/blame between the two. The coefficients of these coaching variables will be further explored at a later date.

A problem with these models is that even though we are dealing with hundreds of thousands of events the results can still be very noisy due to the sheer number of components we are trying to isolate. This is due to two main reasons, which we’ll compare to basketball where this method originated. First, NHL players don’t get as much playing time as NBA player. James Harden led the NBA with 3,125 minutes in 2015-16 while Ryan Suter led the NHL with only 1,748 minutes. Secondly, even though we are measuring Expected Goals here (which are only recorded for unblocked shot-attempts), the occurrence of these events is much rarer than points in the NBA. The average NHL team took about 2,600 unblocked shot-attempts last year while the average NBA team scored roughly 8,400 points.

We can help stabilize our coefficients (player ratings) by giving each coefficient in our model a prior from the previous season. Without these priors our model would start every year fresh, assuming that every player was league average. Since ridge regression pulls ratings toward priors, Oliver Ekman-Larsson’s XPM moves toward a higher number than Connor Murphy’s (assuming the priors rate Ekman-Larsson better than Murphy). The prior is a mathematical way to say, “If you’re not sure who should get credit for this, it’s probably the guy who we already think is better.”  – (Alex Suchman). This an extremely valuable tool because it can greatly boost the model’s allocation of credit and blame. These priors are also applied to every explanatory coefficient, not just players.

One downside of using priors is that the model doesn’t tell us about a single season in a vacuum. It does, however, allow us to better isolate a player’s individual talent. It’s the same logic that goes into HERO Charts showing a regressed version of a player’s last three seasons. It also doesn’t prevent a player’s rating from moving up or down. A good season will still cause you to have a higher rating and a bad season will cause you to have a lower rating. To further explain the impact of priors I will share this explanation from Alex Suchman with some slight tweaks so that it applies to this model:

[Expected Plus-Minus] does NOT measure how well a player has performed this season.

Statistics can be divided into two categories. Descriptive statistics tell us about what happened in the past. For instance, I can check how many page views this blog post has. Predictive statistics try to forecast what will happen in the future. I could create a model that estimates how many page views I’ll get over the next 24 hours. This difference between these is subtle, but important.

[Expected Plus-Minus] is meant to be predictive. It’s interested in how well a player will perform in the future, rather than what he did in the past. [XPM’s] emphasis on prediction explains why it uses some of the tricks it does.

For instance, I mentioned earlier that [XPM] uses data from [the previous season] in its priors. If my primary goal is to evaluate how well a player did this season, it wouldn’t make a lot of sense to use data from [last season]. However, if I want to predict what will happen in the future, the older numbers can help differentiate between players who have been consistently good (and will likely keep being good) and players who are merely going through a hot streak (and will likely regress to their mean).

This method can also be repeated for special teams. Power play defense and short-handed offense were removed due to noise issues. Special teams are much more challenging than even-strength for the reasons stated above: small samples are the norm and make this type of analysis even more difficult.

Data will be released for public consumption in the final section of the write up. XPM is presented as a per 60 rate stat. Below are the top 10 offensive and defensive performances from the 2015-2016 season for both forwards and defensemen according to XPM (minimum 800 minutes TOI for forwards and 1000 minutes for defensemen):

Conclusion

Through the use of ridge regression and Bayesian techniques, Expected Plus-Minus is a step forward into analyzing shot attempt numbers. While they do not allow us to isolate the cause of each player’s rating, whether that be through their teammates or competition, they do provide a more accurate representation of a player’s value than more commonly used methods. XPM can be used as a starting point towards more granular analysis to uncover the exact causes of a given rating.

Please let me know if you have any thoughts, questions, concerns or suggestions. You can comment below, reach me via email me here: DTMAboutHeart@gmail.com or via Twitter @DTMAboutHeart.

3 thoughts on “Introducing Expected Plus-Minus”

1. Can you give some more details about the regression setup and what “Rate at which an Expected Goals event took place” means as a target.

My initial impression is the following:
– Every “event” enters as a row
– The target variable is a binary based on whether the event is an “expected goal” or not
– You use (regularized) logistic regression to fit the feature weights

Is that an accurate understanding? If so, what are the types of “events” that enter as rows? If a shift has no such events, is it not included in the regression?