Combining Manually-Tracked Data with Play-by-Play Data

This post assumes beginner knowledge of R.

If you’ve ever analyzed hockey data, then you’re probably familiar with the NHL’s Real Time Scoring System, which produces what’s more commonly known as play-by-play data. These data are publicly available and allow us to see every event recorded by the NHL in a given game. Shown below are selected details about the first 10 events from two games on February 18, 2019: Tampa Bay at Columbus and Vegas at Colorado.

Continue reading

The Importance of Pressure for a Successful Forecheck

Most of my posts so far have talked about zone exits from the perspective of the team trying to breakout out of their defensive zone. Now, let’s flip the script and discuss the team on the forecheck. This team does not have possession of the puck, but they are in their offensive zone, which is an advantage. So, how can they regain control?

Continue reading

Team Level Zone Exits

From past posts, we have a general sense of the basics of zone exits: zone exits are important because they get you out of your zone and towards an opportunity to score. The key to a successful zone exit is maintaining possession, ideally by avoiding the temptation to dump the puck out.

But so far, we have only looked at zone exits league wide. Most fans care about one particular team more than the rest, but we haven’t looked at team-level results at all. So today, let’s see how each team has performed at zone exits over the past three seasons.

Continue reading

Visualizing and Quantifying Passing on the Power Play

Visualizing passes isn’t easy in hockey. In any given KHL game, there are between 700 and 900 Passes. Somewhere between 65% to 85% are successful*. If you wanted to focus on just the successful ones, you’d have to find a way to meaningfully and concisely represent 500-700 events. Let’s start with something simpler: the Power play. If we further restrict our target to passes by single teams during 5v4 power plays in the OZ, we still get between 40 and 50 passes per game per team. Looking at two random KHL games, you can see that this is still quite a lot of passes:

There are some trends to be picked up on, but it’s not very clean. And any semi-serious opposition scouting (especially of special teams) will take into account multiple games, which then leads to an unidentifiable mess when plotted.

Continue reading

So You Got Accepted To Present at a Sports Analytics Conference

First of all, congratulations if you got accepted. Kudos to you if you got accepted to a conference like RITSAC, a very well run and well curated conference. This is a wonderful accomplishment, and you should be proud. Tell your friends and family. Celebrate. Bask in the adoration.

Well, maybe not that last part. But you get my point. Your work clearly has some perceived value and is based on solid reasoning and data analysis.

So now what?

Continue reading

Revisiting NWHL Game Score

In March 2018, Shawn Ferris of Hockey Graphs introduced his NWHL Game Score, which was based on Dom Luszczyszyn’s NHL Game Score. It was groundbreaking work in women’s hockey analytics, which is still very much in its infancy — especially at the professional level.

Game score is a valuable tool that can give us a better understanding of a player’s performance than points for skaters or save percentage and goals against average for goaltenders. It provides us with a single value that incorporates relevant points of data which we can use to compare the performances of two or more players in a single game or over the course of many games, including seasons and careers.

As Shawn noted in his work, game score is particularly valuable for analyzing performance in the NWHL because of the brevity of the regular season. Through the league’s first four seasons, the average length of a season was under 18 games. The 2019-20 season promises a schedule of 24 games, which is still less than a third of the length of the NHL season. That brief schedule creates an opportunity for shooting percentage factors to influence both a players’ production and our perception of their performance.

Continue reading

Expected Goals Model with Pre-Shot Movement, Part 3: 2018-2019 Data

Yesterday we looked at the team and skater results from the 2016 – 2018 data that was used to train the xG model. That’s a pretty robust dataset, but it’s unfortunately a bit out of date. People care about this season, and past years are old news. So let’s take a look at the data that Corey Sznajder has tracked for 2018 – 2019 so far.

Continue reading

Expected Goals Model with Pre-Shot Movement, Part 2: Historic Team and Player Results

Intro

In the last post, we introduced a new expected goals (xG) model. It incorporates pre-shot movement, which made it more accurate than existing public xG models when predicting which shots would be goals. However, we use xG models for far more than looking at individual shots. By aggregating expected goals at the player and team level, we can get a better sense of how each of them performs.

Continue reading