Hacking the NHL Play-by-Play App in Shiny

Recently, I created a web application for interactively visualizing shot data for all games in the 2017-2018 season. In this article, I will walk through a month-long process building the National Hockey League Play-by-Play App from scratch, giving a behind-the-scenes look.

What started this project was this #rstats Shiny contest tweet. Shiny is a R package built by RStudio for creating interactive web applications. It allows R programmers to create web applications without having to exclusively code in HTML, CSS or JavaScript. I had looked at several sports visualizations (e.g. Ryo’s Visualize the World Cup) and wanted to create something similar in hockey. This announcement provided the motivation for me to start.

I started with sharpening my Shiny skills by taking DataCamp’s Shiny Course. I particularly found Chapter 2 (Inputs, outputs, and rendering functions) and Chapter 3 (Reactive Programming) helpful in reminding myself of the essence of Shiny. They are great visual learning resources and I highly recommend beginners in Shiny to take this course.

Now, I focused on the structure of my application. The organization of my end product is instrumental in its usability, so I wanted to get it right. I looked at the Shiny Application Layout Guide and decided to go with the Grid Layout, which contains a plot at the top and parameters of the plot at the bottom in a three column format. This is the best organization for focusing the users on the animation at the top. The secondary features, which are the parameters controlling the plot, are stationed at the bottom.
Now, to the actual animation. I relied on Ryo’s World Cup animations, which was rendered in gganimate, a R package for animations. Unfortunately, unlike Ryo’s dataset, my dataset didn’t contain coordinate data points with the location of each player over time. Rather, my Play-by-Play, Real Time Scoring System dataset only had shot location:

Figure 1: Snapshot of raw shot data by Corsica Hockey

If the NHL had tracked real time coordinate data like the NFL, I could have created a fluid animation like this:

Figure 2: Tyreek Hill’s TD reception during Week 1 of 17/18 season. Video here. Source: http://bit.ly/nfl-bigdata

So, here is a hack I came up with. First, I “normalized” the shot locations so that all shots taken by the home team were shown on the right and shots taken by the away team were shown on the left. Then, after every shot location data, I input (x,y) coordinates (82, 0) and (-82, 0) to mark the location of both nets. Next, I created a column called event_index that groups each pair of shot data (1 row for shot location, 1 row for net location). I then created a column called event_frame that numerates all the rows. Last, I used group aesthetic on event_index and added transition_components(time = event_frame) to render the animation.

Figure 3: Data processed for animation

This was all great, but I realized that the gganimate package doesn’t work well with Shiny. There is no function designed to render gganimate animations on Shiny. In other words, there was no natural way to put my animations on my end product, which was a huge concern.

This StackOverflow answer was super helpful in coming up with another hack.  It recommended saving the animation as a .gif file and returning the file as a list along with the dimensions of the animation. There is one drawback to this method though: the animation looks stretched out if I increase the width too much, and it moves downward if I increase the height too much. As a result, what I currently have is the best I could come up with. High image resolution. Optimal placement.

The animation happens on a NHL ice rink created by War On Ice. I added “reactive” team logos on Shiny to clearly indicate which side is the home/away side. Also, in the app, users need to input the official game ID in order to navigate between games. In order to facilitate this process, I included a datatable of all the game IDs, game dates, home teams, and away teams next to the animation. That way, the user can find the desired game by searching through game dates or teams, locate the right Game ID, and render the right animation.

Figure 4:  Animation of 2017-10-04 Regular Season Game between the Toronto Maple Leafs and the Winnipeg Jets

Now, the other visualizations. I took a long, hard look at the dataset and thought about which columns to make use of. I thought the shot distance was pretty interesting, so I created a histogram of the shot distance. This illustrates the number of shots a team took at a certain distance from the net. To help the user interpret the distances, I labelled the location of the faceoff circles, blue line, and the red line. Furthermore, expected goal probability is a frequently occuring metric in hockey analytic discussions. I thought it would be interesting to see its change throughout the game. As a result, I animated expected goal probabilities for each team. This plot generated the most buzz.

Figure 5: Animation of Expected Goal Probability during 2017-10-04 Regular Season Game between the Toronto Maple Leafs and the Winnipeg Jets

Last, I wanted to include a summary of the game by showing the boxscore. However, I ran into too many roadblocks with html / css, so I decided to simply show the nhl.com official recap.

Some neat features I added to the app include a short tour using the rintrojs package. When the user presses the Help bottom on the top right corner, Shiny gives a short tour, explaining what each of the parameters do. Also, the “Share” button allows users to easily share the app with a custom message I included and the “Code” button redirects users to the Github repo.

Figure 6: Illustration of the rintrojs package

The final product is available here: NHL Play-by-Play App

The Epilogue to Quantifying Differences between the Regular Season and the Playoffs

Introduction

After several months of learning the concept of survival analysis and applying it to hockey, I published my article, “Quantifying Differences between the Regular Season and Playoffs using Survival Analysis”. Among the readers, one noteworthy individual in the sports analytics community commented on Twitter:



His tweet motivated this brief analysis to answer the first question: “Can I repeat my previous analysis for regular season by period?”. First, I only look at regular season data and change the treatment variable from whether the game is played during the regular season or playoffs to whether it was played in Period X vs Period Y. Then, I approach Tom’s question in a different way by keeping the treatment variable as regular season vs playoffs, but filter for the 1st, 2nd, and 3rd periods. This further shows the discrepancy in change in rates of events by period.

Continue reading

Wins Above Replacement: The Process (Part 2)

In part 1, we covered WAR in hockey and baseball, discussed each field’s prior philosophies, and cemented the goals for our own WAR model. This part will be devoted to the process – how we assign value to players over multiple components to sum to a total value for any given player. We’ll cover the two main modeling aspects and how we adjust for overall team performance. Given our affinity for baseball’s philosophy and the overall influence it’s had on us, let’s first go back to baseball and look at how they do it, briefly.

Continue reading

Reviving Regularized Adjusted Plus-Minus for Hockey

Introduction

In this piece we will cover Adjusted Plus-Minus (APM) / Regularized Adjusted Plus-Minus (RAPM) as a method for evaluating skaters in the NHL. Some of you may be familiar with this process – both of these methods were developed for evaluating players in the NBA and have since been modified to do the same for skaters in the NHL. We first need to acknowledge the work of Brian Macdonald. He proposed how the NBA RAPM models could be applied for skater evaluation in hockey in three papers on the subject: paper 1, paper 2, and paper 3. We highly encourage you to read these papers as they were instrumental in our own development of the RAPM method.

While the APM/RAPM method is established in the NBA and to a much lesser extent the NHL, we feel (especially for hockey) revisiting the history, process, and implementation of the RAPM technique is overdue. This method has become the go-to public framework for evaluating a given player’s value within the NBA. There are multiple versions of the framework, which we can collectively call “regression analysis”, but APM was the original method developed. The goal of this type of analysis (APM/RAPM) is to isolate a given player’s contribution while on the ice independent of all factors that we can account for. Put simply, this allows us to better measure the individual performance of a given player in an environment where many factors can impact their raw results. We will start with the history of the technique, move on to a demonstration of how linear regression works for this purpose, and finally cover how we apply this to measuring skater performance in the NHL.

Continue reading

Quantifying Differences between the Regular Season and Playoffs using Survival Analysis

Introduction

From a casual fan’s perspective, the intensity traditionally ramps up in the playoffs because teams are closer to the grand prize, the Stanley Cup. Fans are hyped up by the storylines and rivalries for every series, and so each event feels all the more momentous. So, how different are the rates of goals, shots, or hits from the regular season to the playoffs? Does the fact that a game is played during the playoffs change these rates significantly? Which rates don’t change that much?

Continue reading

Public Ballots May Be Changing Award Voting Behavior

My office was recently planning an offsite social event. During a team meeting, we brainstormed what activity to do together. Along with ideas like mini golf, hiking, and wine tasting, someone suggested karaoke. The team initially responded positively, so when everyone turned to me, I said “sure, that sounds fun”. Then someone put the options in a Google Form for us to all vote on privately. I opened it at my desk and immediately voted for karaoke dead last. I didn’t want to be a downer in public, but there was no way I was doing karaoke.

Being in public changes our behavior. It’s a natural trait and totally understandable. What’s interesting is understanding when and how it changes, and the NHL awards voting may have given us an opportunity to do just that. For the 2017-2018 season, the Professional Hockey Writers Association (PHWA) made their individual voter ballots public for the first time, and it appears that this may have affected how some writers voted.

Continue reading

NHL Scoring Trends, 2007-08 to 2017-18: Is the League Getting More Competitive?

Photo by Bobby Schultz, via Wikimedia Commons

Though it was completely tangential to @SteveBurtch’s line of thinking, his brief comments pondering the competitiveness between the middle of NHL lineups yesterday (which I can’t locate now, natch) got me thinking about whether the NHL and team management has gotten any more efficient or competitive overall the last decade. With 10 years in the books for complex Corsi data, and hockey’s seeming “Moneyball moment” fully here regardless of the quibbling on social and mainstream media, is the league getting any tighter?

Continue reading

Goal Scorer Cluster Analysis

“They don’t ask how. They ask how many.”

-Hockey Proverb

“But seriously though… how?”

-Me

To state the obvious: goal-scoring is an essential skill for a hockey team. Players have made long careers by putting the puck in the net.

But how do players create goals? Skaters rely on all sorts of skills to score; some are fast, some have a huge shot, and some know how to be in the right place for an easy tap-in. But we don’t have a rigorous view of what those skills are, how they fit together, and which players rely on which ones.

In this piece, I take 100 of the top NHL goal-scorers and apply unsupervised learning techniques to group them into specific goal scoring types. The result is a classification that buckets the scorers into 5 categories: bombers, rushers, chance makers, chaos makers, and physical forces. These can help players understand how to apply their skill set to goalscoring. It can also help teams make sure that their system is putting their top players in a position to score.

Continue reading

Estimating Shot Assist Quantities for Skaters

876402208

Hockey fans and analysts have always appreciated the importance of passing. But until the passing project led by Ryan Stimson, we couldn’t quantify that importance. His work supported by a team of volunteers and other analysts has established that the passing sequence prior to a shot is a significant predictor of the likelihood of the shot becoming a goal. His work also showed that measuring shots and shot assists combined as shot contributions is a better predictor of future performance for both players and teams than shots alone.

Knowing that, the logical next step is to use passing data in analysis whenever possible. Unfortunately, the NHL does not provide passing data so it must be manually tracked by people like Corey Sznajder. Corey’s work is invaluable and I encourage you to support him but he’s only one person.

This article attempts to estimate a player’s quantity of shot assists in a given sample using publicly available data to help fill in gaps where tracked data doesn’t exist.

Continue reading

Who You Calling Weak? Draft Class Variance

This year’s NHL draft class is weak. I don’t follow junior prospects closely, but that’s what I’ve heard from more knowledgeable sources. It’s a fair claim; Nolan Patrick and Nico Hischier seem talented but not among the game-changing talents that have recently been drafted first overall.

However, it’s harder to judge the draft class past the very top. Scouting is hard, especially for hundreds of prospects across the world. It’s possible that while there is no clear star in the draft class, the rest of the draft is as strong as ever.

That would have big implications for draft strategy. The conventional wisdom is that teams may trade more picks this year because they believe the weak draft class makes the picks less valuable. But if the draft is typical after the first few picks, that would be a poor use of assets.

We don’t yet know how well this year’s draft class will do in the NHL. But, we can use historical data to ask questions that establish expectations: how well does each draft class typically perform, and how much does this vary by year?

Continue reading