Building a Shot-Plotting App in Shiny

For me at least, hand tracking is 99% of the time born out of necessity. 

The only way I am ever going to get location data for shots is if I break out a multicoloured pen and write down all the locations and numbers myself. Its isn’t however exactly the quickest process to deal with.

I actually really enjoy hand tracking is the thing, It keeps me focused on the game at hand and stops my mind from wandering. The issue comes when it’s time to digitise that information for analysis. I have written about this before over at The Ice Garden, back when I tracked an entire season of the Australian Womens Hockey League. That season it took me around an hour of straight work to plug in every piece of information so that tableau could process it and as my life got busier, the amount of free time I could dedicate got less and less. 

The idea to force a shiny app to do something it has no right to do came out of necessity. Partially because I wanted to be able to show heat maps to the Head Coach of the local team I work with during intermission, but mostly because my Masters project consists of getting school kids ages 11+ involved in sports analytics and I really wanted them to be able to produce their own heat maps and yet I really did not want to attempt to explain the complexities of Kernel Density Charts to a collection of 12-year-olds.

So here we are. 

The Hockey Plotter 1.1

Continue reading

How to Debug Data Science Code

Think of everyone who has a talent you admire. Athletes, writers, anyone. If you were to ask each of them for the secret to their success, how many of them would be able to give the true answer? I’m not saying that they would deliberately lie. Rather, it’s just genuinely very hard to objectively assess oneself and turn natural implicit behaviors into explicit lessons that can be described to others.

Implicit lessons can be a barrier to people learning new skills: it’s much harder to learn something if their instructor doesn’t know it’s something they ought to teach. The best teachers are able to put themselves into the shoes of their students and convey the most important pieces of information.

One area of data science that is too often left implicit is troubleshooting. Everyone who writes code will get error messages. This is frustrating and can halt progress until solved. Yet most resources devoted to teaching new data scientists don’t discuss what to do, as if they’re expected to study enough to code everything correctly the first time and never encounter an unexpected error. You can find articles about common mistakes that data scientists make, but what about when you inevitably make an uncommon one? There are very few resources around how to debug broken code. (This one is quite nice, and these two are worth a read as well.) 

That’s what I’m hoping to partially remedy with this article. It’s far from the single canonical process for debugging, but I hope that it helps people get unstuck while they learn. The key points I want to convey are:

  • Every data scientist hits an error messages regularly, and doing so as a new programmer is not a sign of failure
  • Isolate the issue by finding the smallest piece of code that creates the problem
  • The exact language of an error message can be extremely helpful, even if it doesn’t make sense
  • The internet is (only in this particular instance) your friend, and there are particular resources that are particularly helpful for solving problems

Continue reading

An Introduction to R With Hockey Data

I have written a couple articles over the past few months on using R with hockey data (see here and here), but both of those articles were focused on intermediate techniques and presumed beginner knowledge of R. In contrast, this article is for the complete beginner. We’ll go through the steps of downloading and setting up R and then, with the use of a sample hockey data set, learn the very basics of R for exploring and visualizing data.

One of the wonderful things about using R is that it’s a flexible, growing language, meaning that there are often many different ways to get to the same, correct result. The examples below are meant to be a gentle introduction to different parts of R, but please know that this really only scratches the surface of what’s available.

The code used for this tutorial (which also includes more detail and more examples) is available on our Github here.

Downloading R and Getting Set Up

Continue reading

Expected Goals Model with Pre-Shot Movement, Part 3: 2018-2019 Data

Yesterday we looked at the team and skater results from the 2016 – 2018 data that was used to train the xG model. That’s a pretty robust dataset, but it’s unfortunately a bit out of date. People care about this season, and past years are old news. So let’s take a look at the data that Corey Sznajder has tracked for 2018 – 2019 so far.

Continue reading

Reviving Regularized Adjusted Plus-Minus for Hockey

Introduction

In this piece we will cover Adjusted Plus-Minus (APM) / Regularized Adjusted Plus-Minus (RAPM) as a method for evaluating skaters in the NHL. Some of you may be familiar with this process – both of these methods were developed for evaluating players in the NBA and have since been modified to do the same for skaters in the NHL. We first need to acknowledge the work of Brian Macdonald. He proposed how the NBA RAPM models could be applied for skater evaluation in hockey in three papers on the subject: paper 1, paper 2, and paper 3. We highly encourage you to read these papers as they were instrumental in our own development of the RAPM method.

While the APM/RAPM method is established in the NBA and to a much lesser extent the NHL, we feel (especially for hockey) revisiting the history, process, and implementation of the RAPM technique is overdue. This method has become the go-to public framework for evaluating a given player’s value within the NBA. There are multiple versions of the framework, which we can collectively call “regression analysis”, but APM was the original method developed. The goal of this type of analysis (APM/RAPM) is to isolate a given player’s contribution while on the ice independent of all factors that we can account for. Put simply, this allows us to better measure the individual performance of a given player in an environment where many factors can impact their raw results. We will start with the history of the technique, move on to a demonstration of how linear regression works for this purpose, and finally cover how we apply this to measuring skater performance in the NHL.

Continue reading

Data Viz in Excel – Tips & Tricks

These days, everyone and their mother is going to tell you to learn to code if you want to jump into sports analytics. And while I’m not going to say “don’t do it,” I am a petty betch who really hates being told what to do (see: my on-going resistance to yoga).

Also, I’m busy, and learning to code is a whole thing that takes time. You are also probably busy, or maybe just starting to dip your toe into sports analytics as a hobby. Maybe you’ve tried learning to code and it just doesn’t make sense to you.

None of that should discourage you from playing around with hockey data and writing up what you find. In fact, there’s a perfectly good tool you can use to visualize most of the basics. Excel!

Excel gets made fun of for many reasons, but what I see most often is cutting comments about its basic visualization tools. To put it nicely, they’re…rough.

But making pleasing, easy-to-understand viz with Excel is possible! I’ve done it! Multiple times!

So, I’ve written down some of my best tips, most of which are applicable when you’re using a more powerful program, too.

1) Know what you want to show and why you want to show it. 

Continue reading

How to Get Started in Hockey Analytics

Intro

Analytics, so hot right now. But how do you get started? People from all sorts of background and levels of expertise have contributed valuable work to hockey analytics, but the journey can feel daunting.

In this post, I want to lay out my personal advice for what knowledge and skills are needed and how to get them. Your mileage will vary, but I think much of this will be useful to anyone who is interested in starting to do their own analytics research or writing.

Continue reading

Revisiting Relative Shot Metrics – Part 2

In part 1, I described three “pen and paper” methods for evaluating players based on performance relative to their teammates. As I mentioned, there is some confusion around what differentiates the relative to team (Rel Team) and relative to teammate (Rel TM) methods (it also doesn’t help that we’re dealing with two metrics that have the same name save four letters). I thought it would be worthwhile to compare them in various ways. The following comparisons will help us explore how each one works, what each tells us, and how we can use them (or which we should use). Additionally, I’ll attempt to tie it all together as we look into some of the adjustments I covered at the end of part 1.

A quick note: WOWY is a unique approach, which limits it’s comparative potential in this regard. As a result, I won’t be evaluating/comparing the WOWY method further. However, we’ll dive into some WOWYs to explore the Rel TM metric a bit later.

Rel Team vs. Rel TM

Note: For the rest of the article, the “low TOI” adjustment will be included in the Rel TM calculation. Additionally, “unadjusted” and “adjusted” will indicate if the team adjustment is implemented. All data used from here on is from the past ten seasons (’07-08 through ’16-17), is even-strength, and includes only qualified skaters (minimum of 336 minutes for Forwards and 429 minutes for Defensemen per season as estimated by the top 390 F and 210 D per season over this timeframe).

Below, I plotted Rel Team against both the adjusted and unadjusted Rel TM numbers. I have shaded the points based on each skater’s team’s EV Corsi differential in the games that skater played in:

relattive_cow_comp Continue reading

Revisiting Relative Shot Metrics – Part 1

Relative shot metrics have been around for years. I realized this past summer, however, that I didn’t really know what differentiated them, and attempting to implement or use a metric that you don’t fully understand can be problematic. They’ve been available pretty much anywhere you could find hockey numbers forever and have often been regarded as the “best” version of whatever metric they were used for to evaluate skaters (Corsi/Fenwick/Expected Goals). So I took it upon myself to gain a better understanding of what they are and how they work. In part 1, I’ll summarize the various types of relative shot metrics and show how each is calculated. I’ll be focusing on relative to team, WOWY (with or without you), and the relative to teammate methods.

A Brief Summary

All relative shot metrics whether it be WOWY, relative to team (Rel Team), or relative to teammate (Rel TM) are essentially trying to answer the same question: how well did any given player perform relative to that player’s teammates? Let’s briefly discuss the idea behind this question and why it was asked in the first place. Corsi, and its usual form of on-ice Corsi For % (abbreviated CF%) is easily the most recognizable statistic outside of the standard NHL provided boxscore metrics. A player’s on-ice CF% accounts for all shots taken and allowed (Corsi For / (Corsi For + Corsi Against)) when that player was on the ice (if you’re unfamiliar please check out this explainer from JenLC). While this may be useful for some cursory or high-level analysis, it does not account for a player’s team or a player’s teammates.

Continue reading

An Introduction To New Tracking Technology

The first significant breakthrough in hockey analytics occurred in the mid-2000’s when analysts discovered the importance of Corsi in describing and predicting future success. Since that time, we’ve seen the creation of expected goals, WAR models, and more. Many have cited that the next big breakthrough in hockey analytics will come once the NHL is able to provide tracking data. We’ve already seen some of the incredible applications of the MLB’s Statcast data and the NBA’s SportVu data. Unfortunately, the NHL has no immediate plans to publicly provide this data and as such, many analysts have decided to manually obtain the data.

Continue reading