I’ve been fortunate enough to be able to attend the last three years of the RIT Sports Analytics Conference. The first year I went, I was nervous to meet people whose work I admired. I was afraid that nobody would want to talk to this new person that few people knew and who was just starting to learn about the field.
Welcome to the second article in our series on basic data cleaning and data manipulation! In this article, we’re going to use play-by-play data from two NHL games and answer two questions:
which power play unit generated the best shot rate in each game?
which defenseman played the most 5v5 minutes in each game?
In the process of doing so, we’ll cover several topics of basic data manipulation in the tidyverse, including using functions, creating joins, grouping and summarizing data, and working with string data.
If you’ve ever analyzed hockey data, then you’re probably familiar with the NHL’s Real Time Scoring System, which produces what’s more commonly known as play-by-play data. These data are publicly available and allow us to see every event recorded by the NHL in a given game. Shown below are selected details about the first 10 events from two games on February 18, 2019: Tampa Bay at Columbus and Vegas at Colorado.
Most of my posts so far have talked about zone exits from the perspective of the team trying to breakout out of their defensive zone. Now, let’s flip the script and discuss the team on the forecheck. This team does not have possession of the puck, but they are in their offensive zone, which is an advantage. So, how can they regain control?
From pastposts, we have a general sense of the basics of zone exits: zone exits are important because they get you out of your zone and towards an opportunity to score. The key to a successful zone exit is maintaining possession, ideally by avoiding the temptation to dump the puck out.
But so far, we have only looked at zone exits league wide. Most fans care about one particular team more than the rest, but we haven’t looked at team-level results at all. So today, let’s see how each team has performed at zone exits over the past three seasons.
Visualizing passes isn’t easy in hockey. In any given KHL game, there are between 700 and 900 Passes. Somewhere between 65% to 85% are successful*. If you wanted to focus on just the successful ones, you’d have to find a way to meaningfully and concisely represent 500-700 events. Let’s start with something simpler: the Power play. If we further restrict our target to passes by single teams during 5v4 power plays in the OZ, we still get between 40 and 50 passes per game per team. Looking at two random KHL games, you can see that this is still quite a lot of passes:
There are some trends to be picked up on, but it’s not very clean. And any semi-serious opposition scouting (especially of special teams) will take into account multiple games, which then leads to an unidentifiable mess when plotted.
First of all, congratulations if you got accepted. Kudos to you if you got accepted to a conference like RITSAC, a very well run and well curated conference. This is a wonderful accomplishment, and you should be proud. Tell your friends and family. Celebrate. Bask in the adoration.
Well, maybe not that last part. But you get my point. Your work clearly has some perceived value and is based on solid reasoning and data analysis.
Game score is a valuable tool that can give us a better understanding of a player’s performance than points for skaters or save percentage and goals against average for goaltenders. It provides us with a single value that incorporates relevant points of data which we can use to compare the performances of two or more players in a single game or over the course of many games, including seasons and careers.
As Shawn noted in his work, game score is particularly valuable for analyzing performance in the NWHL because of the brevity of the regular season. Through the league’s first four seasons, the average length of a season was under 18 games. The 2019-20 season promises a schedule of 24 games, which is still less than a third of the length of the NHL season. That brief schedule creates an opportunity for shooting percentage factors to influence both a players’ production and our perception of their performance.
Something in hockey has been bugging me for years. Technically a lot of things about hockey bug me, but let’s not get sidetracked right off the bat. The irritating aspect of hockey I want to focus on today are neutral zone faceoff wins. They rarely ever lead to anything interesting.