Behind the Numbers: Theory on Environmental Impacts and Chemistry

We’re bringing it back! Every once in a while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net. My ramblings will look at the theory and philosophy behind analytics and their applications given what is already publicly known, keeping my job safe while still getting to interact with the public hockeysphere.

I’m back and here to ramble on things like models, sheltering, and environmental impacts on the results we measure.

Continue reading

Quantifying the influence of no-trade clauses, signing bonuses and LTIR on NHL cap tables

The recently agreed CBA extension and MOU (April 2020) includes provisions suggesting a flat salary cap for years to come, and as a result, general managers and players have experienced an unprecedented draft, free agency and arbitration marketplace this fall. NHL league activity is expected to continue under a particularly unique context caused by loss of hockey related revenue from the Covid-19 pandemic, and the upcoming Seattle expansion draft.

Under this challenging and uncertain financial landscape, I endeavored to conduct contract research to better identify league-wide contract negotiation trends and evaluate anticipated flexibility of NHL team’s salary cap structures by looking at:
– No-trade clauses
– Signing bonuses (S.B.)
– Injury reserve (IR) and long term injury reserve (LTIR)

Previous Contract Analysis Work

Having started my journey in analytics with the opportunity to grow as part of the inaugural hockey-graphs mentorship program, it is a privilege to take this opportunity to build on the inspiring contract negotiation and player valuation work of Matt Cane (The Time Value of Money and Player Valuation), Mike Zsolt (The Financial Frontier: Defining characteristics of competitive salary cap management), Josh and Luke Younggren (Projecting NHL Skater Contracts for the 2019 Offseason), and Shayna Goldman (ISOLHAC: How can we better our contract analysis), amongst other distinguished leaders in the analytics community.

Continue reading

How Canada and the US differ in their roster philosophies during Olympic cycles

While the 2022 Beijing Winter Olympics are still over a year away and the memories of Pyeongchang are still fresh in many fans’ minds (with only one World Championship taking place since then) centralisation for both Canada and the USA is rapidly approaching. Countries historically pick their rosters around late May, beginning of June in the year prior to the Olympics to allow time for players to train, bond and participate in exhibition games before the final roster selection occurring just a month before the big event. What goes on during those 9 months prior to skating out of that Olympic ice surface is largely kept a secret with roster decisions often being announced in a somewhat cut-throat manner and additional players often being drawn in from outside the bubble to the surprise of everyone. Throughout this article, we will be looking at the survival rates of skaters on National Teams over the past 30 years and investigating what this means for roster selection heading into Beijing.

In 2018 between the two teams there were only 3 first time players. Cayla Barnes and Sidney Morin both lined up for the USA on the big stage while Sarah Nurse did the same for Canada. That is of course not to say these players didn’t have prior international experience. Nurse made her national team debut at the 2015 4 Nations Cup and had also represented Canada at a U18 level. Cayla Barnes while just 18 at the time of centralisation had played for the United States 3 times at U18’s including Captaining them to a Gold medal that very year while Morin had previously represented the USA at the 2017 The Time Is Now Tour. While there were only 3 ‘true’ rookies between the two teams that was not to say this was the same line-up as the previous Olympic in Sochi with Team Canada having 8 players missing from their gold medal-winning Sochi side, and the USA missing 7.  I have put their names below as we will return to them later.

CANADAUSA
Caroline OuelletteAlex Carpenter
Catherine WardAnne Schleper
Gillian AppsJosephine Pucci
Hayley WickenheiserJulie Chu
Jayna HeffordKelli Stack
Jennifer WakefieldLyndsey Fry
Lauriane RougeauMichelle Picard
Tara Watchorn 
Skaters from the 2014 rosters not included in the 2018 rosters
Continue reading

Building a Shot-Plotting App in Shiny

For me at least, hand tracking is 99% of the time born out of necessity. 

The only way I am ever going to get location data for shots is if I break out a multicoloured pen and write down all the locations and numbers myself. Its isn’t however exactly the quickest process to deal with.

I actually really enjoy hand tracking is the thing, It keeps me focused on the game at hand and stops my mind from wandering. The issue comes when it’s time to digitise that information for analysis. I have written about this before over at The Ice Garden, back when I tracked an entire season of the Australian Womens Hockey League. That season it took me around an hour of straight work to plug in every piece of information so that tableau could process it and as my life got busier, the amount of free time I could dedicate got less and less. 

The idea to force a shiny app to do something it has no right to do came out of necessity. Partially because I wanted to be able to show heat maps to the Head Coach of the local team I work with during intermission, but mostly because my Masters project consists of getting school kids ages 11+ involved in sports analytics and I really wanted them to be able to produce their own heat maps and yet I really did not want to attempt to explain the complexities of Kernel Density Charts to a collection of 12-year-olds.

So here we are. 

The Hockey Plotter 1.1

Continue reading

Chatter Charts – Visualizing Real-Time Fan Reactions

Today, I’ll explain the methodology behind Chatter Charts and show you how I use statistics, R and Python to analyze hockey from a completely unexplored angle: your point of view.

I. Introducing Chatter Charts

Chatter Charts is a sports visualization that mixes statistics with social media data. And unlike most charts, it is specifically designed to thrive on social media; it is presented in video and filled with volatility, humour, and relatable moments.

It assumes a game is like a linear story—filled with peaks and troughs—except every story is written by fan comments on social media. It actually tries to recreate the emotional roller coaster fans tend to experience when watching sports.

Image for post

But most people don’t know about the math and code behind Chatter Charts. It isn’t just me picking words I think are funny or a simple word count—it uses a topic modeling technique called TF-IDF to statistically rank them.

I want to go through that with you today.

Continue reading

Applied Prospect PipeLinE (APPLE): Assisting the analysis of hockey prospects using Recurrent Neural Networks

The NHL Draft acts as the proverbial reset of the NHL calendar. Teams re-evaluate the direction of their organizations, make roster decisions, and welcome a new crop of drafted prospect into the fold. Irrespective of pick position, each team’s goal is to select players most likely to play in the NHL and to sustain success. Most players arrive to the NHL in their early 20s, which leaves teams having to interpolate what a player will be 4-5 years out. This project attempts to address this difficult task of non-linear player projections. The goal is to build a model for NHL success/value using a player’s development — specifically using all current/historical scoring data to estimate the performance of a player in subsequent seasons and the possible leagues the player is expected to be in.

Continue reading

Racial Bias in Drafting and Development: The NHL’s Black Quarterback Problem

Introduction

It is far from shocking that the National Hockey League has no peer among major American sports leagues in terms of racial homogeneity. Most estimates place the proportion of White players in the league in the range of 92-95%, far from comparable leagues like the National Football League, National Basketball Association and even Major League Baseball.

In the past year, the league celebrated an obscure but rather dubious milestone. If you combined all the faceoffs* taken by every Black player** in the NHL between 2008 and 2019, you would end up with 14,375 total faceoffs, or about 20 fewer than Golden Knights center Paul Stastny in that time frame (according to Hockey-reference.com). It was only in this past season that the total of the Black players overtook Stastny.

A close up of a map

Description automatically generated

Continue reading

Examining Player Development in NCAA DI Women’s Hockey with Game Score Pt. 2

Continued from Pt. 1

When do women’s hockey players reach their peak? How do they develop? These questions may sound straightforward, but they are exceedingly difficult to answer because of the finite opportunities for players to pursue high-level post-collegiate hockey. There is no consensus “top” professional league in the world, and major international tournaments are brief; conclusions we draw from them can be heavily skewed by the group format.

For all these reasons and more, NCAA DI (Division I) is a logical place to explore player development. It is data-rich, relative to the rest of women’s hockey, and Carleen Markey’s work with aging curves placed CWHL (Canadian Women’s Hockey League) skaters’ peak offensive production between the ages of 22 and 23. That falls within the range of many collegiate careers.

Credit: Carleen Markey

The Pipeline

The zenith of skill and competition in the world of women’s hockey are the Olympics and the IIHF Women’s World Championship. These tournaments are filled with, and often dominated by, active DI players and alumnae. As one might expect, the majority of those players represent Team USA and Team Canada.

At the 2019 Worlds in Espoo, Finland, all of Team USA’s roster and 20 of the 23 players on Team Canada spent at least one year in an NCAA DI program, compared to just five of the 23 players on Team Finland’s silver medal-winning team, and one player on Team Russia’s fourth-place team. 

That said, there are more international players playing college hockey in North America every year. Per biographical data on EliteProspects.com, the ratio of international players in DI hockey climbed from 4.17 percent in 2015-16 to 5.07 percent in 2019-20.

Those percentages don’t mean much without the context of the women’s hockey landscape across the globe. According to the IIHF, there are 88,732 registered female players in Canada and 82,808 in the U.S. Outside of North America, there are 26,381 registered players in Sweden, Finland, Czech Republic, Russia, France, Germany, Switzerland, Japan, and Norway combined.

Continue reading

Examining Player Development in NCAA DI Women’s Hockey with Game Score Pt. 1

Carleen Markey broke new ground with her presentation on women’s hockey aging curves in the CWHL (Canadian Women’s Hockey League) at RITSAC 2019. Her work, which was built from the scaffolding of the Evolving Wild twins’ aging curves, established that offensive production among CWHL skaters peaked around age 22 to 23. That work by Markey got me thinking about how players developed just before going pro in North America and Europe, and/or becoming fixtures on national teams.

So, I set my eyes on NCAA DI (Division I) women’s hockey.

DI schools have served as the primary pipeline of talent for Team Canada and Team USA for decades. Furthermore, DI schools have served as a valuable proving ground for many of the most talented European players in the world. With Carleen’s work in mind, I set out to analyze how skaters developed in DI hockey before they reached their peak production years and their athletic prime.

Approach 

The greatest obstacle to any statistical analysis of the women’s game is the scarcity of public data. Fortunately, NCAA DI is something of an exception because of sites like collegehockeystats.net, collegehockeynews.com, and the database on HockeyEastOnline.com.

I decided on developing a game score for DI hockey to serve as an all-in-one stat that could provide a rough measure of a player’s overall impact or value. Dom Luszczyszyn first applied game score to hockey, and his work provided a framework. Creating game score for DI hockey was also appealing because I was able to apply lessons learned from working with Shawn Ferris’ NWHL (National Women’s Hockey League) game score. At the time, this sounded like fewer headaches for me. I was wrong; I had forgotten how many headaches there were the first go around.

Continue reading

How to Debug Data Science Code

Think of everyone who has a talent you admire. Athletes, writers, anyone. If you were to ask each of them for the secret to their success, how many of them would be able to give the true answer? I’m not saying that they would deliberately lie. Rather, it’s just genuinely very hard to objectively assess oneself and turn natural implicit behaviors into explicit lessons that can be described to others.

Implicit lessons can be a barrier to people learning new skills: it’s much harder to learn something if their instructor doesn’t know it’s something they ought to teach. The best teachers are able to put themselves into the shoes of their students and convey the most important pieces of information.

One area of data science that is too often left implicit is troubleshooting. Everyone who writes code will get error messages. This is frustrating and can halt progress until solved. Yet most resources devoted to teaching new data scientists don’t discuss what to do, as if they’re expected to study enough to code everything correctly the first time and never encounter an unexpected error. You can find articles about common mistakes that data scientists make, but what about when you inevitably make an uncommon one? There are very few resources around how to debug broken code. (This one is quite nice, and these two are worth a read as well.) 

That’s what I’m hoping to partially remedy with this article. It’s far from the single canonical process for debugging, but I hope that it helps people get unstuck while they learn. The key points I want to convey are:

  • Every data scientist hits an error messages regularly, and doing so as a new programmer is not a sign of failure
  • Isolate the issue by finding the smallest piece of code that creates the problem
  • The exact language of an error message can be extremely helpful, even if it doesn’t make sense
  • The internet is (only in this particular instance) your friend, and there are particular resources that are particularly helpful for solving problems

Continue reading