Quantifying the Value of an NHL Timeout using Survival Analysis: Part 1

I’d like to thank Luke Benz, my mentor via the Hockey Graphs Mentorship Program, for all of his help in developing this project.

Introduction

Hockey, by nature, is a fast-paced sport that can be difficult to represent by discrete situations. While most other professional sports can be viewed as combinations of distinct in-game events – at-bats in baseball, plays and series in football, and even possessions in basketball – hockey is extremely fluid, with a constantly changing game state. This difference in game flow means that there are far fewer opportunities for a hockey coach to make any decisions based on distinct game states. While, for example, a football coach has several opportunities per game to decide whether or not to attempt a fourth-down conversion, a hockey coach has very few chances to make any comparable choice that can affect the outcome of the game. However, there are a few tools available to a hockey coach that can be researched so as to optimize their effectiveness in helping a team to win a game.

The most-researched of these decisions (thus far) for an NHL coach is when to pull the goalie in an endgame situation. There have been several papers published regarding the optimal time to pull the goalie, such as these two by Beaudoin and Swartz in 2010 and by Brown and Asness in 2018. (For even more great work on goalie pull times, you can check out Meghan Hall’s talk from the 2019 Seattle Hockey Analytics Conference and her Tableau dashboard, as well as the Goalie Pull Twitter Bot created by Rob Vollman and MoneyPuck.com.) All of this prior research has found that NHL teams should pull their goalies much sooner than conventional wisdom suggests, as teams are much more likely to score to tie the game if they pull their goalie earlier rather than later.

However, beyond pulling the goalie, there are still a few more tools at a coach’s disposal. Teams are allowed to challenge goals for certain rule infractions, use a 30-second timeout during a stoppage in play, or switch goalies if the starter is having a bad game, in addition to personnel decisions regarding line combinations or matching up players against the other team. This article focuses on timeout usage, but I plan to explore the other tools in future work.

Continue reading

The State of Goalie Pulling in the NHL

When people ask me how to get into sports analytics, I always suggest starting with a question that they’re interested in exploring and using that question as a framework for learning the domain knowledge and the technical skills they need. I feel comfortable giving this advice because it’s exactly how I got into hockey analytics: I was curious about goalie pulling, and I couldn’t find enough data to satisfy my curiosity. There are plenty of articles on when teams should pull their goalies, but aside from a 2015 article on FiveThirtyEight by Michael Lopez and Noah Davis, I couldn’t find much data on when NHL teams were actually pulling their goalies and if game trends were catching up to the mathematical recommendations. I presented some data on the topic at the Seattle Hockey Analytics Conference in March 2019, but the following analysis is broader and includes more seasons of data.

Data collection notes

  • All raw play-by-play data is courtesy of Evolving-Hockey and their scraper.
  • Data includes all regular season games from 2013-14 onward. All 2019-20 data is up until the season pause, through March 11, 2020.
  • Only the first goalie pull per team in each game is counted for the average times. For example, if a team pulled their goalie while trailing by two and then later in the game pulled their goalie again while trailing by one, only the first instance is included in the average times. All extra attacker time is counted for the scoring rates.
  • More details on this data set, particularly at the team level, is available here.
Continue reading

Introducing NWHLe and Translation Factors

In April 2017, Rob Vollman tweeted out what he called “rough and preliminary” translation factors for women’s hockey. At the time, I was playing around with counting stats from two years of NWHL and CWHL hockey, and wanted to develop as many tools and resources as I could to better understand the women’s game. Curious to know what the competitive landscape of post-collegiate hockey looked like in North America and elsewhere, I began to keep track of data with the intention of building on Rob’s translation factors.

The world of women’s hockey in North America has changed dramatically in the three years since Rob’s tweet. My initial plans went up in smoke when the CWHL suddenly folded after the 2018-19 season. As a result, I shifted my focus to developing NWHL equivalency factors – or NWHLe – for NCAA DI, NCAA DIII, and USports. Unfortunately, it quickly became apparent that the sample size of USports alumnae to play a significant number of games in the NWHL was too small to work with.

Continue reading

Using Sequences for Analysis: Expected Goals Contribution and more

In a previous article, I presented a way to cut and slice a hockey game into Sequences. A Sequence extends from the moment a team gets control of the puck and starts moving forward, to the moment the team loses it for good. The objective was to measure the importance of every event happening between the beginning of a Sequence and its end, from a zone exit to any shot attempts, to a zone entry or any high-danger passes in between. If a Sequence includes one or several shot attempts, its value is the sum of the Expected Goals of all those attempts.

The natural follow-up was the creation of an Expected Goals Contribution metric for players.

The thinking behind it was to answer one of the two main questions we face in the daily use of analytics with coaches: What is the real contribution of each player? Overall, there are the well-known GAR or WAR type of metrics, but these are beyond the comprehension of many staffs as they are not tangible enough for a daily use.

Now, if we use Sequences where the team has possession of the puck, it means Expected Goals Contribution would only look at the offensive side of the game. Still, instead of looking separately at transition or shooting stats to evaluate a player, the objective is to sum all offensive efforts into one metric, weighting those efforts (zone exit, entry, etc.) according to their contribution to the Sequence. It also makes playmaking more apparent statistically.

In other words, it means sharing the total value of the Sequence (in terms of Expected Goals), between the players responsible. This is what we called Expected Goals Contribution.

Continue reading

Using Data to Inform Shorthanded Neutral Zone Decisions

The following is data is all at 4-on-5 with both goalies in their nets. A special thanks to Evolving Hockey for data and their scraper.

In March of 2019, Mike Pfeil coined the term “powerkill” at the Seattle Hockey Analytics Conference. It was much more of a small excerpt from his whole presentation, but it seemed to motivate Meghan Hall and Alison Lukan. In the coming months, Lukan would write about how the Columbus Blue Jackets utilized an aggressive approach in their penalty killing system, while Hall would present at RITSAC and OTTHAC before they finally came together to present at the Columbus Blue Jackets Hockey Analytics Conference in February.

Looking to continue researching this phenomenon, I set out to answer a few questions I had. In order to give shots some added context beyond what the NHL’s public data supplies, throughout the last few months, I tracked shot assists and where possessions leading to shots had started. As a side benefit, I was also able to filter out shots that didn’t appear to exist, were recorded incorrectly, or where the possession started at 4-on-4.

In 2016, Matt Cane developed a metric to approximate penalty kill aggressiveness by combining penalty kill controlled and failed entries for, and dividing them by the entries a penalty kill faces from their opponent. The theory behind that being that penalty kills that attempt to control more entries into the offensive zone are inherently more aggressive. Hall and Lukan also found that a penalty kill’s rate of controlled entries has a strong correlation to the rate at which they take shots.

Part of the reason these two stats have such a strong correlation is that the vast majority of shots require a zone entry. Not including rebound shots, 82% of 4v5 shots stemmed from possessions starting outside of the offensive zone over the course of the 2019-20 season.

zones

Continue reading

By the numbers: thinking about the World Championships a different way

This post was co-authored by Shayna Goldman and Alison Lukan

As part of the global response to the COVID-19 pandemic, the 2020 World Championship was cancelled. But, we still wanted to see how rosters for an international tournament with NHLers could have shaken out. While it’s easy to just put together an All Star lineup for most countries, we wanted to add a twist: each country’s roster could only include NHL players and each team had to be compliant with the 2019-20 salary cap. 

So what does this look like? A little bit about our process, first.

Six teams will compete in our fictitious tournament: Canada, USA, Sweden, Finland, Russia, and Europe. Each roster consists of 12 forwards, six defenders, and two goaltenders. Because we were limited to NHL players, talent from outside of those core countries in Europe was combined to form one super team. 

Continue reading

Introducing Offensive Sequences and The Hockey Decision Tree

If you ever work for a hockey team as an analyst, you could be facing two very recurrent questions from the coaching staff. The first one is very practical: How can analytics help us work better and faster? The second one is: What is the real contribution of each player? Meaning beyond the usual on-ice “possession” stats like Corsi or Expected Goals and individual production metrics such as shots taken, scoring chances, expected goals created, zone exits, entries, or even high-danger passes (passes that end or go through the slot). But those events were not yet statistically linked to each other. Finding a way to provide answers to both questions was my goal for the last few months, and the solution was: I needed to split the game in “Sequences”.

Video coaches often break down game tape to highlight certain plays, such as a rush-based attack or a zone exit under pressure. I wanted to do the same and divide a game in as many parts as necessary, or “Sequences”. Roughly, every time the puck changes possession between teams, a new Sequence” begins. That’s about 250 Sequences per game.

Looking at this from the point of view of the team that owns the puck, offensive Sequences extend from the moment a team gets control of the puck and starts moving forward, to the moment she loses it for good, and it must include a shot attempt in the process to have a positive value. How does this work? Let’s say a player gets the puck back in your defensive zone, you try a zone exit but fail. Sequence starts over, there can only be one exit recorded in the Sequence. So he tries another zone exit and succeed, gets into the offensive zone, the team records a couple of shot attempts, loses the puck and if the other teams gets enough control of it to try a zone exit, it means the end of the Sequence.

How does this help? Well, the basic principle is to see the total value of a Sequence. We’re use Expected Goals as our measure of “value”. To do that, we add the Expected Goals of the shot attempts in the Sequence. For example, a Sequence with two shot attempts:

  • A high danger shot: 0.23 Expected Goals
  • A shot from the blue line: 0.01 Expected Goals
  • Total Sequence value: 0.23 + 0.01 = 0.24 Expected Goals

Sequences

Continue reading

Which League is Best?

This work is co-authored with Madeline Gall.

While scouting for some sports is straightforward (college football → NFL), scouting for the NHL can be a more arduous process. With players from over 45+ international ice hockey leagues, each with its own regulations and difficulties, how can one adequately assess the quality of a player’s performance? Comparisons between leagues are not easily made; 18 points for an eighteen year old playing against other eighteen year olds in a minor league should not be attributed the same value as 18 points for an eighteen year old playing against veterans in the NHL. 

There have been other attempts to account for this, including player translation variables, like that of Rob Vollman’s hockey translation factors, and Gabriel Desjardin’s NHL Equivalency Ratings (NHLe). Desjardin’s NHLe previously tackled the issue of comparing and predicting player performance for League-to-NHL transitions (moving from another league into the NHL). It was great for a quick, general comparison and certainly has its advantages (easy and quick to calculate), but there are some drawbacks to its method. For starters, it didn’t necessarily control for team quality, position, and age. Translation factors are calculated using statistics from players who have played at least 20 games in the given league before playing at least 20 in the NHL. That means there’s a lot of valuable data about these in-between transitions that aren’t being used. 

In this project, we introduce a new method for comparing and projecting player performance across leagues using an adjusted z-score metric that would account for these drawbacks. This metric controls for factors such as age, league, season, and position that affect a player’s P/PG metric, and could be applied to any league of interest. This new metric is necessary as there are many characteristics that vary from league to league. Due to the different playing styles and opponent difficulty, there is not one consistent metric to make comparable evaluations of player performance for hockey leagues around the world. Other factors such as goalie strength, penalty rates, and rink dimensions are also inconsistent across international leagues. Scenarios could occur in which players of similar strength could appear to have seemingly different performances.

Continue reading

An Introduction to R With Hockey Data

I have written a couple articles over the past few months on using R with hockey data (see here and here), but both of those articles were focused on intermediate techniques and presumed beginner knowledge of R. In contrast, this article is for the complete beginner. We’ll go through the steps of downloading and setting up R and then, with the use of a sample hockey data set, learn the very basics of R for exploring and visualizing data.

One of the wonderful things about using R is that it’s a flexible, growing language, meaning that there are often many different ways to get to the same, correct result. The examples below are meant to be a gentle introduction to different parts of R, but please know that this really only scratches the surface of what’s available.

The code used for this tutorial (which also includes more detail and more examples) is available on our Github here.

Downloading R and Getting Set Up

Continue reading

Lateral Puck Movement in the NZ

Research shows that lateral/”east-west” puck movement in the offensive zone is beneficial to increasing one’s odds of scoring. But I have now heard from people in various positions within the hockey industry on why it might also be useful to generate east-west puck movement in the neutral zone. The theories – focused on lateral passing, lane changes and stretch passes, respectively – all boiled down to one point: When you rush the puck up ice, the defending team will focus on that side, leaving the other side of the ice somewhat more open, so there might be open ice to exploit.

Continue reading