Yesterday we looked at the team and skater results from the 2016 – 2018 data that was used to train the xG model. That’s a pretty robust dataset, but it’s unfortunately a bit out of date. People care about this season, and past years are old news. So let’s take a look at the data that Corey Sznajder has tracked for 2018 – 2019 so far.Continue reading
In this piece we will cover Adjusted Plus-Minus (APM) / Regularized Adjusted Plus-Minus (RAPM) as a method for evaluating skaters in the NHL. Some of you may be familiar with this process – both of these methods were developed for evaluating players in the NBA and have since been modified to do the same for skaters in the NHL. We first need to acknowledge the work of Brian Macdonald. He proposed how the NBA RAPM models could be applied for skater evaluation in hockey in three papers on the subject: paper 1, paper 2, and paper 3. We highly encourage you to read these papers as they were instrumental in our own development of the RAPM method.
While the APM/RAPM method is established in the NBA and to a much lesser extent the NHL, we feel (especially for hockey) revisiting the history, process, and implementation of the RAPM technique is overdue. This method has become the go-to public framework for evaluating a given player’s value within the NBA. There are multiple versions of the framework, which we can collectively call “regression analysis”, but APM was the original method developed. The goal of this type of analysis (APM/RAPM) is to isolate a given player’s contribution while on the ice independent of all factors that we can account for. Put simply, this allows us to better measure the individual performance of a given player in an environment where many factors can impact their raw results. We will start with the history of the technique, move on to a demonstration of how linear regression works for this purpose, and finally cover how we apply this to measuring skater performance in the NHL.Continue reading
These days, everyone and their mother is going to tell you to learn to code if you want to jump into sports analytics. And while I’m not going to say “don’t do it,” I am a petty betch who really hates being told what to do (see: my on-going resistance to yoga).
Also, I’m busy, and learning to code is a whole thing that takes time. You are also probably busy, or maybe just starting to dip your toe into sports analytics as a hobby. Maybe you’ve tried learning to code and it just doesn’t make sense to you.
None of that should discourage you from playing around with hockey data and writing up what you find. In fact, there’s a perfectly good tool you can use to visualize most of the basics. Excel!
Excel gets made fun of for many reasons, but what I see most often is cutting comments about its basic visualization tools. To put it nicely, they’re…rough.
But making pleasing, easy-to-understand viz with Excel is possible! I’ve done it! Multiple times!
So, I’ve written down some of my best tips, most of which are applicable when you’re using a more powerful program, too.
1) Know what you want to show and why you want to show it.
Analytics, so hot right now. But how do you get started? People from all sorts of background and levels of expertise have contributed valuable work to hockey analytics, but the journey can feel daunting.
In this post, I want to lay out my personal advice for what knowledge and skills are needed and how to get them. Your mileage will vary, but I think much of this will be useful to anyone who is interested in starting to do their own analytics research or writing.
In part 1, I described three “pen and paper” methods for evaluating players based on performance relative to their teammates. As I mentioned, there is some confusion around what differentiates the relative to team (Rel Team) and relative to teammate (Rel TM) methods (it also doesn’t help that we’re dealing with two metrics that have the same name save four letters). I thought it would be worthwhile to compare them in various ways. The following comparisons will help us explore how each one works, what each tells us, and how we can use them (or which we should use). Additionally, I’ll attempt to tie it all together as we look into some of the adjustments I covered at the end of part 1.
A quick note: WOWY is a unique approach, which limits it’s comparative potential in this regard. As a result, I won’t be evaluating/comparing the WOWY method further. However, we’ll dive into some WOWYs to explore the Rel TM metric a bit later.
Rel Team vs. Rel TM
Note: For the rest of the article, the “low TOI” adjustment will be included in the Rel TM calculation. Additionally, “unadjusted” and “adjusted” will indicate if the team adjustment is implemented. All data used from here on is from the past ten seasons (’07-08 through ’16-17), is even-strength, and includes only qualified skaters (minimum of 336 minutes for Forwards and 429 minutes for Defensemen per season as estimated by the top 390 F and 210 D per season over this timeframe).
Below, I plotted Rel Team against both the adjusted and unadjusted Rel TM numbers. I have shaded the points based on each skater’s team’s EV Corsi differential in the games that skater played in:
Relative shot metrics have been around for years. I realized this past summer, however, that I didn’t really know what differentiated them, and attempting to implement or use a metric that you don’t fully understand can be problematic. They’ve been available pretty much anywhere you could find hockey numbers forever and have often been regarded as the “best” version of whatever metric they were used for to evaluate skaters (Corsi/Fenwick/Expected Goals). So I took it upon myself to gain a better understanding of what they are and how they work. In part 1, I’ll summarize the various types of relative shot metrics and show how each is calculated. I’ll be focusing on relative to team, WOWY (with or without you), and the relative to teammate methods.
A Brief Summary
All relative shot metrics whether it be WOWY, relative to team (Rel Team), or relative to teammate (Rel TM) are essentially trying to answer the same question: how well did any given player perform relative to that player’s teammates? Let’s briefly discuss the idea behind this question and why it was asked in the first place. Corsi, and its usual form of on-ice Corsi For % (abbreviated CF%) is easily the most recognizable statistic outside of the standard NHL provided boxscore metrics. A player’s on-ice CF% accounts for all shots taken and allowed (Corsi For / (Corsi For + Corsi Against)) when that player was on the ice (if you’re unfamiliar please check out this explainer from JenLC). While this may be useful for some cursory or high-level analysis, it does not account for a player’s team or a player’s teammates.
The first significant breakthrough in hockey analytics occurred in the mid-2000’s when analysts discovered the importance of Corsi in describing and predicting future success. Since that time, we’ve seen the creation of expected goals, WAR models, and more. Many have cited that the next big breakthrough in hockey analytics will come once the NHL is able to provide tracking data. We’ve already seen some of the incredible applications of the MLB’s Statcast data and the NBA’s SportVu data. Unfortunately, the NHL has no immediate plans to publicly provide this data and as such, many analysts have decided to manually obtain the data.
Never been a meaningful correlation between shot attempts & puck possession. Poor proxy. Shot attempts are valuable but don’t = possession.
— Mike Kelly (@MikeKellyNHL) June 7, 2016
Here you go, Mike, you old stocky codger.
That is meaningful.
Back in October 2015, @asmae_t and I first unveiled an Expected Goals model which proved to be a better predictor of team and player goalscoring performance than any other public model to date. Thanks to the feedback of the community, a few adjustments and corrections were made since then. The changes were the following:
- Score state was a variable that was accounted for in the model but was not explicitly mentioned in the original write-up. Recall that after accounting for all variables, including score state, it was found that a shot attempted by a trailing team still has a lower likelihood of resulting in a goal than a shot taken by a leading team.
- The shot multiplier in Part I of the original write-up was adjusted using a historical weighted average instead of in-season data. Thus, a 2016 shot multiplier for example would be based on the average of the regressed goals (rGoals) and regressed shots (rShots) of 2014 and 2015. This adjustment improved the model’s performance against score-adjusted Corsi and goals % in predicting future scoring, as seen in the graph below. We thank @Cane_Matt again for pointing out this error.
5v5 shots, Senators +3% at Jets. pic.twitter.com/xemFFKZ6e1
— Micah Blake McCurdy (@IneffectiveMath) September 30, 2015
A couple days ago, Micah Blake McCurdy made his first step towards The Great Unknown. It’s a decision hanging on a number of questions we always ask ourselves in the analytics community: What is my work worth to me? What is my work worth to others? For as much time as my spend on it, how can I make sure my work means something, and my time rewarded? How do I make sure my work stays exactly that: mine?
For the past decade, a number of powerful minds have navigated The Great Unknown, finding that apprehensive teams were only willing to commit peanuts and, on rare occasions, real salaried work after a partnership of a couple years. What made The Great Unknown even more of a mystery was the disappearance of sites, and data, and “stats” groups peddling other people’s work (usually in poor or incorrect fashion), and the discovery by some stats analysts that teams had been tracking data in ways that were curious, tedious, unhelpful. When the so-called Summer of Analytics occurred, The Great Unknown had the curtain pulled back a little bit: we started knowing who was getting hired where. But that peek exposed the still-immense uncertainty of the work available with some teams, and opened a new area of intrigue: analytics writing.
So why is what Micah is doing so important?