Exploratory Data Analysis Using Tidyverse

This post assumes beginner knowledge of R.

Welcome to the second article in our series on basic data cleaning and data manipulation! In this article, we’re going to use play-by-play data from two NHL games and answer two questions:

  • which power play unit generated the best shot rate in each game?
  • which defenseman played the most 5v5 minutes in each game?

In the process of doing so, we’ll cover several topics of basic data manipulation in the tidyverse, including using functions, creating joins, grouping and summarizing data, and working with string data.

Continue reading

Improving Opposition Analysis by Examining Tactical Matchups

On Monday, I introduced some work on quantifying and identifying team playing styles, which built upon my earlier work on identifying individual playing styles. Today we’re going to discuss how to make this data actionable.

What are the quantifiable traits of successful teams? What plays are they executing that makes them successful? How can we use data to then build a style of play that is more successful than what we’re currently doing? The way we bridge the gap between front office and behind the bench is by providing data to improve their matchup preparation, lineup optimization, and enhance tactical decisions.

This is what I mean by actionable: applying data-driven analysis and decision-making inside the coach’s room and on the ice. All data is from 5v5 situations and is either from the Passing Project or from Corsica.

Continue reading

Identifying Playing Styles with Clustering

One of the aspects of player performance that is discussed ad nauseam is chemistry. How well do certain players elevate their performance with one player or another due to some inherent ability to find the other on the ice? To know what a teammate is going to do? However, very little has been done to analyze this phenomenon. In this piece, I posit that by identifying playing styles, something that’s been done in the NBA, we can quantify how well certain players will complement one another.

All data is from 5v5 situations from the 2015 – 2016 and current season, totaling almost 900 games from the Passing Project volunteers and Corey Sznajder. Special thanks to Asmae for her guidance throughout this piece.

I want to stress that this is a first foray into this type of analysis and simply because a player has a different style than what I’ve named (which are relatively arbitrary) it doesn’t mean they are necessarily better than another player. Players may have similar styles, but some will simply be more effective due to their ability. Finally, given that each day we accumulate more data, a player with a smaller sample size could find themselves in a different cluster in future analysis.

Continue reading

Redefining Defensemen based on Transitional Play

Last time, I showed how passing data is a better predictor of future player scoring than existing public metrics. In this piece, I’m going to spend some time talking about how we can more reliably evaluate offensive and defensive contributions from defensemen, which has been difficult due to a lack of data. Not only due to a lack of data, but from a lack of flexibility regarding the identity of the position. Traditionally thought of as existing to defend and “make a good first pass,” I feel this limits the scope of both how we evaluate the position and its responsibilities.

In order to better evaluate defensemen, we need to identify specific metrics that we can tie to future goals. In looking at entry assists (a pass occurring in the neutral or defensive zones that precedes a shot), both for and against, we can quantify how effective that defensemen is at generating offense in transition, as well as suppressing those chances. The importance of those things at the team level is something I’ve previously discussed (transition here and defensive work here with Matt Cane). Once we identify these metrics as having a strong impact on future scoring and goal-suppression, we naturally then reevaluate what the proper roles are for a defensemen, which in turn forces us to reevaluate how we evaluate them.

Personally, I’d like to see us think of them more as fullbacks or midfielders in soccer (this is part of a larger concept of redefining positions and responsibilities, which will be posted in the next month or so, I hope). There are still going to be various types of players based on their individual skill set and team tactics, but supporting play, overlapping on the attack, and distribution are all pillars of what teams should look for. Let’s get to it.

All data is from 5v5 situations and special thanks to Dr. McCurdy for pulling the on-ice player data for me. All non-passing project data is from Corsica.

Continue reading

Just How Important is Quality of Competition? Very. Also, not much. It’s All Relative.

*This post is co-authored by DTMAboutHeart and Ryan Stimson*

Recently, the topic of Quality of Competition has been at the forefront of Hockey Twitter. This post hopes to articulate some of the nuance associated with Quality of Competition, as well as Quality of Teammate, metrics and how impactful they are. To do that, we will revisit methods outlined here by Eric Tulsky, namely splitting the competition and teammate quality by position and measuring the impact of each split. Ryan recently wrote about this at the NCAA level, but it has not been looked at with much rigor at the NHL level.

Both Quality of Competition and Quality of Teammates matter. They also don’t matter. It depends on the position and metric you’re looking at. All TOI data is 5v5 and from Corsica. Ryan had the game files of who was on the ice during each 5v5 shot from Micah Blake McCurdy, so that data was used as well. Also, thanks to Muneeb for feedback during this process. Thanks to all!

Continue reading

Matt Hunwick, Martin Marincin and Quality of Competition

Embed from Getty Images

During the offseason, the Toronto Maple Leafs made two small additions to their blueline that were lauded by many in the analytics community. At the draft they traded a fourth round pick and a low-tier prospect for Martin Marincin and on the first day of free agency they signed Matt Hunwick to a low money two-year deal.

Both players had very similar trajectories over the previous three seasons. Marincin had a relative shots percentage of +4.3 while playing 15.7 minutes per night while Hunwick landed at +2.8 percent playing 15.3 minutes. Looking at just the 2014-15 season, Hunwick had the edge at +5.1 in 14.3 minutes to Marincin’s +2.4 in 16.1 minutes. Basically, the Leafs acquired two decent and under-appreciated defensemen who have shown ability to push play in the right direction and for a relatively low cost too.

Flash forward to the culmination of their first seasons as Leafs and opinions of the two couldn’t be more different. Marincin is praised regularly while Hunwick is seen as a proverbial boat anchor.

So what’s changed exactly?

Continue reading

Delta Box Score: a model for predicting player scoring independent of teammate quality

 

Introduction

One of the greatest challenges in sports analytics is determining the skill of a player independently of quality of teammates. While a number of tools already exist (e.g. WOWYs in hockey), their (mis)use lends itself to significant limitations and collinearity concerns. This is where regression-based approaches can provide a more rigorous alternative in isolating a player’s true talent.  

An encouraging development in hockey analytics as of late has been Ryan Stimson’s Passing Project, which you can read about here. The goal of this post is to introduce a regression-based method to estimate an NHL player’s expected scoring performance independently of the passing strength of his teammates. To this end, player and linemate data from Stimson’s Passing Project and Muneeb Alam of the 2014-2015 season were used to devise a rate-based metric of a player’s projected goals. The difference between a player’s projected goals per 60 minutes and actual goals per 60 minutes will be called Delta Box Score or DBS.

Continue reading

Redefining Shot Quality: One Pass at a Time

Shot quality has been a topic of late on hockey twitter and various sites. Only a few weeks ago, the Hockey-Graphs Hockey Talk  was centered around this topic. Shot quality is a lightning rod and much of the talking at or past one another that people often do stems from a single issue: there is no agreed-upon definition of what people mean when they say “shot quality.” Well, I like what our own Nick Mercadante had to say on the subject:

Establishing a base, repeatable skill that accounts for pre-shot movement and an increased likelihood of a goal being scored are what we need to properly analyze player contributions. Quantifying passing also gives us another actionable piece of data that everyone understands and coaches can use as well. Often, the simplest metric or method is the best. And, we should able to do that now that we’ve obtained a significant set of data. This chart may look familiar, but it’s essential to understanding how important passing is to goal-scoring. This is from all tracked passing sequences from the six teams (Chicago Blackhawks, Florida Panthers, New Jersey Devils, New York Islanders, New York Rangers, and Washington Capitals) that we tracked last season.

SH%Sequence

From this point on, I want you to forget whatever it is you think of when you hear the term, “shot quality.”

Continue reading

Toronto Maple Leafs Passing and Linkup Network

Recently, I wrote on some data I’d collected from the 2015 OHL Final between the Erie Otters and the Oshawa Generals. This primarily focused on passing network anaylsis that Steve Burtch introduced at the Rochester Institute of Technology Hockey Analytics Conference. Today, I’d like to use the thirteen games we’ve collected to examine the Leafs network.

I’ll be focusing on the weighted degree measure that weights each degree (pass or shot) that a player has within the network. These weights are assigned based on several factors (scoring chance, shot on goal, one-timer, etc.) so we know which connections were more likely to result in a goal than others. This weighting will be adjusted as we get more data, so it’s quite basic and likely not nuanced enough at this point in time.

I’ve used the weighted degree measure because I think it is the best way to use this type of analysis for this sport. This is for a few reasons, some of which I mentioned in the Erie piece, but the biggest is this: Not all players are on the playing field at the same time, so there are actually several networks withing a single game (first line and first pairing, second line and second pairing, and so on). This may level out of over the course of a season, but we’re going to look at the Leafs as a whole, and the Leafs top line network on its own as well. All data is at 5v5 unless otherwise specified. These types of metrics have a purely offensive-minded focus as well.

Continue reading

Using Cluster Analysis To Identify Player Position

Embed from Getty Images

What did you think the first time you watched hockey? Did you know the difference between a forward and a defensive skater? Could you tell the difference just by watching? It’s likely that some outside factor (a friend, the play by play announcer, a graphic on the broadcast) alerted you to the fact that NHL teams use more than one type of skater.

But, say that outside variable never intervened, and you were left to your own devices. How long would it take for you to develop the idea of “forwards” and “defensive skaters”? Would you come up with your own classifications? Would you differentiate them at all?

Continue reading