Combining Manually-Tracked Data with Play-by-Play Data

This post assumes beginner knowledge of R.

If you’ve ever analyzed hockey data, then you’re probably familiar with the NHL’s Real Time Scoring System, which produces what’s more commonly known as play-by-play data. These data are publicly available and allow us to see every event recorded by the NHL in a given game. Shown below are selected details about the first 10 events from two games on February 18, 2019: Tampa Bay at Columbus and Vegas at Colorado.

Continue reading

The Importance of Pressure for a Successful Forecheck

Most of my posts so far have talked about zone exits from the perspective of the team trying to breakout out of their defensive zone. Now, let’s flip the script and discuss the team on the forecheck. This team does not have possession of the puck, but they are in their offensive zone, which is an advantage. So, how can they regain control?

Continue reading

Team Level Zone Exits

From past posts, we have a general sense of the basics of zone exits: zone exits are important because they get you out of your zone and towards an opportunity to score. The key to a successful zone exit is maintaining possession, ideally by avoiding the temptation to dump the puck out.

But so far, we have only looked at zone exits league wide. Most fans care about one particular team more than the rest, but we haven’t looked at team-level results at all. So today, let’s see how each team has performed at zone exits over the past three seasons.

Continue reading

Expected Goals Model with Pre-Shot Movement, Part 3: 2018-2019 Data

Yesterday we looked at the team and skater results from the 2016 – 2018 data that was used to train the xG model. That’s a pretty robust dataset, but it’s unfortunately a bit out of date. People care about this season, and past years are old news. So let’s take a look at the data that Corey Sznajder has tracked for 2018 – 2019 so far.

Continue reading

Expected Goals Model with Pre-Shot Movement, Part 2: Historic Team and Player Results

Intro

In the last post, we introduced a new expected goals (xG) model. It incorporates pre-shot movement, which made it more accurate than existing public xG models when predicting which shots would be goals. However, we use xG models for far more than looking at individual shots. By aggregating expected goals at the player and team level, we can get a better sense of how each of them performs.

Continue reading

Expected Goals Model with Pre-Shot Movement, Part 1: The Model

There are few questions in hockey analytics more fundamental than who played well. Consequently, a large portion of hockey analysis has been focused on how to best measure results. This work is some of the most well-known work in “fancy stats”; when evaluating players and teams, many people who used to look at goals scored moved to focusing on Corsi and then expected goals (xG).

The concept of an xG model is simple: look at the results of past shots to predict whether or not a particular shot will become a goal. Then credit the player who took the shot with that “expected” likelihood of scoring on that shot, regardless of whether or not it went in. Several such models have been developed, including by Emmanuel Perry, Evolving Wild, Moneypuck, and many others.

However, there remains additional room for improving these models. They do impressive work based on the available play-by-play (pbp) data, but that only captures so much. There are big gaps in information, and we know that filling them would make us better at predicting goals.

Perhaps the biggest gap is pre-shot movement. We know that passes before a shot affect the quality of the scoring chance, but the pbp data does not include them. Thankfully, Corey Sznajder’s data does. While it does not cover every single shot over multiple seasons, it is a substantial dataset; when I pulled the data for this model, it had roughly half of the 2016-2017 and 2017-2018 seasons included: 72 thousand shots from 1,085 games. While the number of games tracked varies by team, we have at least 43 for every team except Vegas, for which we have 26. We can use this data to build the first public xG model that incorporates passes.

Continue reading

Why Possession is the Key to Zone Exits

If there’s anything you know from neutral zone analytics, it’s probably this: carry-in zone entries are better than dump-ins. In the linked piece, Eric Tulsky finds that “maintaining possession of the puck at the blue line (carrying or passing the puck across the line) means a team will generate more than twice as much offense as playing dump and chase”.

But what about zone exits? Is possession equally important there? Work by Jen Lute Costello suggests that it is, but her data was limited to one playoff series. Today, I’ll expand on her work to show that maintaining possession is crucial for successful zone exits, and breakouts should be structured with this in mind.

Continue reading

Introduction to the Transition Project

This is one of my favorite plays:

Almost every team is coached to make their opponent fight for every inch. Skjei’s end-to-end rush cuts through those defenses and leaves his team in a much better position than when he started.

But just how much better off did he leave them? How does that compare to alternative outcomes? And which players are the best at making these plays? We have unanswered questions about transitional play. We’d like to study them in more detail, but the gif above doesn’t appear anywhere in the league’s play-by-play data to help conduct analysis.

Continue reading

Wins Above Replacement: The Process (Part 2)

In part 1, we covered WAR in hockey and baseball, discussed each field’s prior philosophies, and cemented the goals for our own WAR model. This part will be devoted to the process – how we assign value to players over multiple components to sum to a total value for any given player. We’ll cover the two main modeling aspects and how we adjust for overall team performance. Given our affinity for baseball’s philosophy and the overall influence it’s had on us, let’s first go back to baseball and look at how they do it, briefly.

Continue reading

Reviving Regularized Adjusted Plus-Minus for Hockey

Introduction

In this piece we will cover Adjusted Plus-Minus (APM) / Regularized Adjusted Plus-Minus (RAPM) as a method for evaluating skaters in the NHL. Some of you may be familiar with this process – both of these methods were developed for evaluating players in the NBA and have since been modified to do the same for skaters in the NHL. We first need to acknowledge the work of Brian Macdonald. He proposed how the NBA RAPM models could be applied for skater evaluation in hockey in three papers on the subject: paper 1, paper 2, and paper 3. We highly encourage you to read these papers as they were instrumental in our own development of the RAPM method.

While the APM/RAPM method is established in the NBA and to a much lesser extent the NHL, we feel (especially for hockey) revisiting the history, process, and implementation of the RAPM technique is overdue. This method has become the go-to public framework for evaluating a given player’s value within the NBA. There are multiple versions of the framework, which we can collectively call “regression analysis”, but APM was the original method developed. The goal of this type of analysis (APM/RAPM) is to isolate a given player’s contribution while on the ice independent of all factors that we can account for. Put simply, this allows us to better measure the individual performance of a given player in an environment where many factors can impact their raw results. We will start with the history of the technique, move on to a demonstration of how linear regression works for this purpose, and finally cover how we apply this to measuring skater performance in the NHL.

Continue reading