Exploratory Data Analysis Using Tidyverse

This post assumes beginner knowledge of R.

Welcome to the second article in our series on basic data cleaning and data manipulation! In this article, we’re going to use play-by-play data from two NHL games and answer two questions:

  • which power play unit generated the best shot rate in each game?
  • which defenseman played the most 5v5 minutes in each game?

In the process of doing so, we’ll cover several topics of basic data manipulation in the tidyverse, including using functions, creating joins, grouping and summarizing data, and working with string data.

Continue reading

Combining Manually-Tracked Data with Play-by-Play Data

This post assumes beginner knowledge of R.

If you’ve ever analyzed hockey data, then you’re probably familiar with the NHL’s Real Time Scoring System, which produces what’s more commonly known as play-by-play data. These data are publicly available and allow us to see every event recorded by the NHL in a given game. Shown below are selected details about the first 10 events from two games on February 18, 2019: Tampa Bay at Columbus and Vegas at Colorado.

Continue reading

The Importance of Pressure for a Successful Forecheck

Most of my posts so far have talked about zone exits from the perspective of the team trying to breakout out of their defensive zone. Now, let’s flip the script and discuss the team on the forecheck. This team does not have possession of the puck, but they are in their offensive zone, which is an advantage. So, how can they regain control?

Continue reading

Team Level Zone Exits

From past posts, we have a general sense of the basics of zone exits: zone exits are important because they get you out of your zone and towards an opportunity to score. The key to a successful zone exit is maintaining possession, ideally by avoiding the temptation to dump the puck out.

But so far, we have only looked at zone exits league wide. Most fans care about one particular team more than the rest, but we haven’t looked at team-level results at all. So today, let’s see how each team has performed at zone exits over the past three seasons.

Continue reading

Visualizing and Quantifying Passing on the Power Play

Visualizing passes isn’t easy in hockey. In any given KHL game, there are between 700 and 900 Passes. Somewhere between 65% to 85% are successful*. If you wanted to focus on just the successful ones, you’d have to find a way to meaningfully and concisely represent 500-700 events. Let’s start with something simpler: the Power play. If we further restrict our target to passes by single teams during 5v4 power plays in the OZ, we still get between 40 and 50 passes per game per team. Looking at two random KHL games, you can see that this is still quite a lot of passes:

There are some trends to be picked up on, but it’s not very clean. And any semi-serious opposition scouting (especially of special teams) will take into account multiple games, which then leads to an unidentifiable mess when plotted.

Continue reading

So You Got Accepted To Present at a Sports Analytics Conference

First of all, congratulations if you got accepted. Kudos to you if you got accepted to a conference like RITSAC, a very well run and well curated conference. This is a wonderful accomplishment, and you should be proud. Tell your friends and family. Celebrate. Bask in the adoration.

Well, maybe not that last part. But you get my point. Your work clearly has some perceived value and is based on solid reasoning and data analysis.

So now what?

Continue reading

Revisiting NWHL Game Score

In March 2018, Shawn Ferris of Hockey Graphs introduced his NWHL Game Score, which was based on Dom Luszczyszyn’s NHL Game Score. It was groundbreaking work in women’s hockey analytics, which is still very much in its infancy — especially at the professional level.

Game score is a valuable tool that can give us a better understanding of a player’s performance than points for skaters or save percentage and goals against average for goaltenders. It provides us with a single value that incorporates relevant points of data which we can use to compare the performances of two or more players in a single game or over the course of many games, including seasons and careers.

As Shawn noted in his work, game score is particularly valuable for analyzing performance in the NWHL because of the brevity of the regular season. Through the league’s first four seasons, the average length of a season was under 18 games. The 2019-20 season promises a schedule of 24 games, which is still less than a third of the length of the NHL season. That brief schedule creates an opportunity for shooting percentage factors to influence both a players’ production and our perception of their performance.

Continue reading