How to Debug Data Science Code

June 22, 2020Alex Novet Data Analysis, Resources 1 Comment

Think of everyone who has a talent you admire. Athletes, writers, anyone. If you were to ask each of them for the secret to their success, how many of them would be able to give the true answer? I’m not saying that they would deliberately lie. Rather, it’s just genuinely very hard to objectively assess oneself and turn natural implicit behaviors into explicit lessons that can be described to others.

Implicit lessons can be a barrier to people learning new skills: it’s much harder to learn something if their instructor doesn’t know it’s something they ought to teach. The best teachers are able to put themselves into the shoes of their students and convey the most important pieces of information.

One area of data science that is too often left implicit is troubleshooting. Everyone who writes code will get error messages. This is frustrating and can halt progress until solved. Yet most resources devoted to teaching new data scientists don’t discuss what to do, as if they’re expected to study enough to code everything correctly the first time and never encounter an unexpected error. You can find articles about common mistakes that data scientists make, but what about when you inevitably make an uncommon one? There are very few resources around how to debug broken code. (This one is quite nice, and these two are worth a read as well.)

That’s what I’m hoping to partially remedy with this article. It’s far from the single canonical process for debugging, but I hope that it helps people get unstuck while they learn. The key points I want to convey are:

Every data scientist hits an error messages regularly, and doing so as a new programmer is not a sign of failure
Isolate the issue by finding the smallest piece of code that creates the problem
The exact language of an error message can be extremely helpful, even if it doesn’t make sense
The internet is (only in this particular instance) your friend, and there are particular resources that are particularly helpful for solving problems

Continue reading →

The Importance of Pressure for a Successful Forecheck

September 12, 2019September 13, 2019Alex Novet coaching, Data Analysis, Forechecking Data, League Wide Report, Neutral Zone Analysis, NHL League-Wide Analysis, Passing Data, Playing Styles, Transition Project 2 Comments

Most of my posts so far have talked about zone exits from the perspective of the team trying to breakout out of their defensive zone. Now, let’s flip the script and discuss the team on the forecheck. This team does not have possession of the puck, but they are in their offensive zone, which is an advantage. So, how can they regain control?

Continue reading →

Team Level Zone Exits

September 10, 2019September 9, 2019Alex Novet coaching, Data Analysis, Forechecking Data, League Wide Report, Neutral Zone Analysis, NHL League-Wide Analysis, Playing Styles, Strategy, Transition Project Leave a comment

From past posts, we have a general sense of the basics of zone exits: zone exits are important because they get you out of your zone and towards an opportunity to score. The key to a successful zone exit is maintaining possession, ideally by avoiding the temptation to dump the puck out.

But so far, we have only looked at zone exits league wide. Most fans care about one particular team more than the rest, but we haven’t looked at team-level results at all. So today, let’s see how each team has performed at zone exits over the past three seasons.

Continue reading →

Expected Goals Model with Pre-Shot Movement, Part 4: Variable Importance

August 15, 2019August 16, 2019Alex Novet Uncategorized 2 Comments

In the last two posts, we’ve looked at the big picture outputs of the xG model for players and teams. Now, let’s zoom in on the model itself to try and understand it better. How is it making its decisions? Which variables provide the most important information and in what way do those variables affect the outcome?

Continue reading →

Expected Goals Model with Pre-Shot Movement, Part 3: 2018-2019 Data

August 14, 2019August 16, 2019Alex Novet Data Analysis, Neutral Zone Analysis, NHL League-Wide Analysis, Passing Data, Player Evaluation, Playing Styles, Resources, Shot Quality, Transition Project Leave a comment

Yesterday we looked at the team and skater results from the 2016 – 2018 data that was used to train the xG model. That’s a pretty robust dataset, but it’s unfortunately a bit out of date. People care about this season, and past years are old news. So let’s take a look at the data that Corey Sznajder has tracked for 2018 – 2019 so far.

Continue reading →

Expected Goals Model with Pre-Shot Movement, Part 2: Historic Team and Player Results

August 13, 2019August 16, 2019Alex Novet Data Analysis, NHL League-Wide Analysis, Passing Data, Player Evaluation, Transition Project Leave a comment

Intro

In the last post, we introduced a new expected goals (xG) model. It incorporates pre-shot movement, which made it more accurate than existing public xG models when predicting which shots would be goals. However, we use xG models for far more than looking at individual shots. By aggregating expected goals at the player and team level, we can get a better sense of how each of them performs.

Continue reading →

Expected Goals Model with Pre-Shot Movement, Part 1: The Model

August 12, 2019August 16, 2019Alex Novet Data Analysis, Neutral Zone Analysis, NHL League-Wide Analysis, Passing Data, Predictions, Shot Quality, Transition Project 2 Comments

There are few questions in hockey analytics more fundamental than who played well. Consequently, a large portion of hockey analysis has been focused on how to best measure results. This work is some of the most well-known work in “fancy stats”; when evaluating players and teams, many people who used to look at goals scored moved to focusing on Corsi and then expected goals (xG).

The concept of an xG model is simple: look at the results of past shots to predict whether or not a particular shot will become a goal. Then credit the player who took the shot with that “expected” likelihood of scoring on that shot, regardless of whether or not it went in. Several such models have been developed, including by Emmanuel Perry, Evolving Wild, Moneypuck, and many others.

However, there remains additional room for improving these models. They do impressive work based on the available play-by-play (pbp) data, but that only captures so much. There are big gaps in information, and we know that filling them would make us better at predicting goals.

Perhaps the biggest gap is pre-shot movement. We know that passes before a shot affect the quality of the scoring chance, but the pbp data does not include them. Thankfully, Corey Sznajder’s data does. While it does not cover every single shot over multiple seasons, it is a substantial dataset; when I pulled the data for this model, it had roughly half of the 2016-2017 and 2017-2018 seasons included: 72 thousand shots from 1,085 games. While the number of games tracked varies by team, we have at least 43 for every team except Vegas, for which we have 26. We can use this data to build the first public xG model that incorporates passes.

Continue reading →

Exit Types Don’t Affect Entry Quality (Much)

August 1, 2019August 15, 2019Alex Novet Uncategorized Leave a comment

Last time, we saw that a team exiting its defensive zone with possession is much more likely to enter their offensive zone. Do the advantages end there, or do possession exits also improve the quality of zone entrances? Perhaps leaving the defensive zone with possession makes it easier to keep possession as they enter the offensive zone, and that leads to more shots per entry. Maybe pass-outs create space for more passes in the offensive zone, which improves shot quality.

It turns out that there is not much of a difference in entry quality by exit type; exiting with possession makes it more likely to gain the offensive zone, but the advantages quickly dissipate. That said, there are some interesting variations in how those zone entries play out. The differences are small enough that they could be random chance, but it’s worth taking stock of what we know with the data we have.

Continue reading →

Why Possession is the Key to Zone Exits

July 30, 2019July 25, 2019Alex Novet coaching, Data Analysis, League Wide Report, NHL League-Wide Analysis, Transition Project Leave a comment

If there’s anything you know from neutral zone analytics, it’s probably this: carry-in zone entries are better than dump-ins. In the linked piece, Eric Tulsky finds that “maintaining possession of the puck at the blue line (carrying or passing the puck across the line) means a team will generate more than twice as much offense as playing dump and chase”.

But what about zone exits? Is possession equally important there? Work by Jen Lute Costello suggests that it is, but her data was limited to one playoff series. Today, I’ll expand on her work to show that maintaining possession is crucial for successful zone exits, and breakouts should be structured with this in mind.

Continue reading →

Introduction to the Transition Project

July 29, 2019July 25, 2019Alex Novet coaching, Data Analysis, Neutral Zone Analysis, NHL League-Wide Analysis, Transition Project Leave a comment

This is one of my favorite plays:

https://twitter.com/hayyyshayyy/status/927700050813374465?ref_src=twsrc%5Etfw

Almost every team is coached to make their opponent fight for every inch. Skjei’s end-to-end rush cuts through those defenses and leaves his team in a much better position than when he started.

But just how much better off did he leave them? How does that compare to alternative outcomes? And which players are the best at making these plays? We have unanswered questions about transitional play. We’d like to study them in more detail, but the gif above doesn’t appear anywhere in the league’s play-by-play data to help conduct analysis.

Continue reading →

Hockey Graphs

Visualizing and analyzing hockey and statistics

Author: Alex Novet

How to Debug Data Science Code

The Importance of Pressure for a Successful Forecheck

Team Level Zone Exits

Expected Goals Model with Pre-Shot Movement, Part 4: Variable Importance

Expected Goals Model with Pre-Shot Movement, Part 3: 2018-2019 Data

Expected Goals Model with Pre-Shot Movement, Part 2: Historic Team and Player Results

Intro

Expected Goals Model with Pre-Shot Movement, Part 1: The Model

Exit Types Don’t Affect Entry Quality (Much)

Why Possession is the Key to Zone Exits

Introduction to the Transition Project