Behind the Numbers: What Makes a Stat Good

By MithrandirMage [CC BY-SA 3.0], via Wikimedia Commons

Every once-in-a-while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net.

Hey! Remember me?

I work full-time for (slash help run) HockeyData, a data tracking and analysis company. Because of this conflict of interest, it limits what I can and cannot talk about. The good news is I can still talk generalities, the basics behind analytical thinking in hockey, and other peoples’ good work, which fits my Behind the Numbers series.

Why have there been so few updates then? Been busy (…lazy).

One generality I’d like to rant about is how we look at and evaluate statistics and models: how meaningful different numbers are and why we view them that way.
Continue reading

Friday Quick Graphs: Update on Predictive Relationships

Screen Shot 2017-05-18 at 1.27.20 PM.png

The above graph is a slight variation of the method employed by JLikens (Tore Purdy) six years ago, almost to the day. The variation being the method I used was extremely simplified. All I did was look at the correlation between each metric for the first 20 games with goals for the next 62 games in the season, with both variables being 5v5 and adjusted for score and home/road venue. I also skipped the lockout shortened season for insufficient games.

Continue reading

Behind the Numbers: Scientific Progress and Diminishing Returns in Hockey Statistics

Embed from Getty Images

Every once-in-a-while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net.

As the hockey analytics community pushes for validation of current metrics and their value, I think it is sometimes lost that we do understand these statistics have their weaknesses. We do wish and try to improve upon these weaknesses.

I also think an often underlooked fact is that each incremental improvement diminishes the potential value from every subsequent improvement.

Let’s take a look at what I mean…

Continue reading

Garret’s look back at VanHAC

article_0b66810f-7cc0-4a3b-8614-a64c327f0119.jpg

Hello all,

Josh and I want to off the top thank everyone for making VanHAC17 such a wonderful success. The Vancouver Canucks for hosting, catering, and supplying so much support and resources. Our financial sponsors Canucks Army and HockeyData. Our helpful registration desk volunteers. Our panelists Dan Murphy and Dimitri Filipovic. Our presenters (more on them below). And a huge applause and thank you to our wonderful keynote speaker: Meghan Chayka.

Let me break down how this conference and the weekend surrounding it went from my perspective.

Continue reading

Behind the Numbers: The issues with binning, QoC, and scoring chances

sc1

Every once-in-a-while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net.

Almost weekly, you will see a “quant” or “math” type complain about some of the binning going on (usually with Quality of Competition or scoring chances).

But the reason may not seem intuitive, so I’ll use scoring chances as an example and explain the issues with binning continuous data.

Continue reading

Friday Quick Graphs: Marginal Gains for Defenders

Screen Shot 2017-01-20 at 6.59.49 PM.png

Last Friday we asked how many goals is improving a team’s first line worth versus their fourth line? What about defenders?

The above graph shows the number of goals over a season a team should expect in improving their player’s shot differential talent, here described in percentiles of talent.

The blue line is first pair with 2nd, 3rd, pairs falling next with red and yellow.

The blue line is the steepest, suggesting that moving from a 55th percentile player to 60th percentile player on the top pair will improve a team’s goal differential more so than a second or third pairing player. (This is not to be confused with improving from a 55% Corsi player to a 60% Corsi player)

Notice how the difference between the top and middle pair is pretty negligible. Improving from an average (median, 50th percentile) to the absolute best in both top and middle pair defenders is only about half a goal difference in improvement. This effect may be due to the fact that teams often place their second best defender on the second pair, whether that may be due to strategy and design or due to handedness “forcing” the team’s hand.

A reminder that the coefficients we found for forwards were 0.24, 0.12, 0.12, and 0.06. This may seem to suggest improvement should be concentrated for top forward line, followed by the top-four defenders, and then middle-six forwards with the bottom pair. However, our method is agnostic of usage and who drives shot differentials more, forwards or defenders.

Friday Quick Graphs: Marginal Gains for Forwards

Screen Shot 2017-01-19 at 1.30.05 PM.png

How many goals is improving a team’s first line worth versus your fourth line?

The above graph shows the number of goals over a season a team should expect in improving their player’s shot differential talent, here described in percentiles of talent.

The blue line is first liners with 2nd, 3rd, and 4th liners falling next with red, yellow, and green.

The blue line is the steepest, suggesting that moving from a 55th percentile player to 60th percentile player on the top line will improve a team’s goal differential by about twice that of a 2nd or 3rd line player. (This is not to be confused with improving from a 55% Corsi player to a 60% Corsi player)

What is interesting is that the marginal gains in improving a 2nd line player and 3rd line player is about equal.

The next question one should ask is: what are the costs in salary and cap hit for making said improvements?

Method:

  1. All forwards over all available full seasons were sorted by 5v5 TOI/GP
  2. Players binned into four groups of equal number of games played
  3. Each bin then sorted by Corsi%, and binned into percentiles
  4. Goal differentials are extrapolated to full season given average TOI per season for each line (so differing rates in injuries and pressbox banishment is being included)

Behind the Numbers: Scoring first and conditional probability

Every once-in-a-while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net.

Not long ago Jason Gregor tweeted about the value of scoring first.

It may be a bit controversial and difficult to get right away, but the value of scoring first is not special. Long ago, Mr. Eric Tulsky, now of the Carolina Hurricanes, showed that the value of scoring first equals the value of any other goal.

Continue reading

Behind the Numbers: Why Plus/Minus is the worst statistic in hockey and should be abolished

Embed from Getty Images

Every once-in-a-while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net.

Hockey’s plus/minus may be the worst statistic in hockey, although there is some debate with goalie statistics not based off of save percentage (like GAA or Win% that just adds a team component to a goalie’s save percentage). It could even be in contention for just the worst statistic in sport.

Now, some people may read that and think I’m simply saying this because I value shot metrics over goal metrics in player evaluations. While I do feel that way, it is only one of a few reasons that that plus/minus fails in being useful.

Continue reading