Behind the Numbers: Where analytics and scouts get the draft wrong

Every once-in-a-while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net. My ramblings will look at the theory and philosophy behind analytics and their applications given what is already publicly known.

Hello everyone; I am back! I was in the process of writing an article on NHL prospect development for after the draft (teaser!) when a Twitter thread sparked my interest and made me want to do a bit of a ranty, very pseudo-Editorial or Literature Review on analytics and the draft while combing over that thread.

The tweet thread has quite a lot of meat to chew on, but I agree with High’s general point and the majority of the specific details. I also thought that he suggests some really, really great people to follow for public scouting resources.


I find it very interesting that many of the same people who dive into NHL advanced stats and create/support models are huge supporters of NHLe in prospect evaluation

I agree; it is very intriguing and a bit ironic that the same community arguing “points are overrated” for measuring a NHL player’s overall value also adamantly uses point production to argue about prospects. Now there are some reasons why these cases are not equitable, but I will get back to that later.

Most modern NHL “WAR models” rightfully ignore or lightly weight points. We should care more about whether a player improves their team’s goal rate (or chances to produce goals) than if they were specifically the last, second last,…, or fifth last to touch the puck on those very same goals. In simpler terms: They don’t ask how, they ask how many.

Adding a box score component can –and often does– improve NHL WAR models. Knowing the “how” still gives you appreciable information on the player, but you don’t want to lose the forest for the trees.

However, most leagues with draft eligible players provide very limited public statistical data. You can’t build a powerful WAR model without more granular information. A prospect model needs to cover the most limited applicable league as well.

Point production still carries a lot of value; it is worth less, not worthless. Looking at all NHL career values for forwards with 50 or more minutes of ice time for 2007-2022, player points per game variation “explained” (r-squared) 51% of the variation in Evolving Hockey’s Goals Above Replacement model. That’s wild (heh, pun) given GAR estimates player value not just offensively but defensively, with penalty differentials, all while trying to adjust for environmental factors.

Remember, I said to not lose the forest for the trees; trees still make up a great deal of the forest.

The basic goal of using analytics in professional player evaluation is to find diamonds in the rough

I disagree, but semantically; the goal of using analytics in professional player evaluation is to make more informed –and hopefully better– decisions everywhere. It is a small difference but still meaningful. Analytics do more than find more Devon Toews players.

  • It helps select the best role to place Toews
  • It helps select the best linemates for Toews
  • It helps select the best opponents for Toews
  • It helps see where you can make the best improvements for developing Toews
  • It helps see when you should to flip Toews and for which players/prospects/picks
  • It helps see where the player’s percieved and actual value differentiate
  • etc.

Analytics are also not limited to quantitative information (ie: stats, or what you find on a Elite Prospect page). Analytics can take combine quantitative and qualitative information (ie: traditional scouting).

Analytics can look at qualitative information and see:

  • Which player attributes are more nature vs nurture (what can and can’t be taught or fixed)
  • Which player attributes predict future success better
  • Which player attributes translate to higher levels better
  • Which scouts evaluate player attributes better than others
  • etc.

Everyone acknowledges that the McDavid’s, Matthews’, and Makar’s of the world are the best in the game, whether they believe in the value of analytics or not.

Agreed, but with caveats.

The best and worst in the extrema are easier to pick out. The “eye test” is much less likely to make an errors with judging the extremes when computing with what they see. After all, your “eye test” is simply a form of analytics with your brain as the model weighting events that you remember the most accordingly to all your biases.

But less likely doesn’t mean perfect. We recently saw Leon Draisaitl win league MVP in a year where more indepth analytics suggested he was near the best, not the best. Ironically the “eye test” defense tended to boil down to “points.” We saw Ovechkin’s 51 goal season get dragged for being an example of all individual accolades with no care for team play due to +/-, despite that more indepth analysis placed his overall impact around his career average (though it was poor defensively). Ovi’s linemates’ struggling shooting percentages carried a lot of the responsibility for his highly negative +/- rating.

Heck, statistical models suggest that the traded-for-pennies Devon Toews is a player worthy of some Norris votes, not just a fill in the gaps of the roster piece. Analytics suggest he is an elite player that was undervalued. Not everyone picked up on that early on, even some people paid to analyze hockey.

Even when you correctly know that player A is better than player B, analytics helps scale by how much, which is also important.

Draft Theory

With all that, I wanted to give a quick overview on analytical Draft Theory. The whats/hows/whys behind NHLe models and the value of statistical analysis in prospect evalutaions.

As far back as possible, people have been using statistics to help evaluate prospects in hockey. The NHLe model, popularized by individuals like Gabe Desjardins, Rob Volman, and others, was simply a way to try and turn scoring between different leagues into one currency. They did this by looking at how much players retained their point per game production when transfering from a developmental league to the NHL.

It’s much easier to compare prospects from two different leagues if you can compare apples to apples.

Now NHLe’s have their issues, specifically with turning scoring into one currency. They look exclusively at the population of players that make the NHL, which creates bias in the survey population. They look at each year of a league as the same, when leagues are dynamic and change in quality. They equate all scoring as equivalent, whether goal, primary assist, or secondary assist. They equate all situations as equivalent, whether even strength, power play, or penalty kill. They look at all positions equally, as if everyone should retain production the same.

There have been attempts to account for these issues, as seen by previous work of Rhys Jessop, myself, and Jeremy Davis. This lead to SEAL NHLes. SEAL adjustes for many of the issues meantioned above by adjusting for Situations (both G,A1,A2 and EV/PP/PK distribution), Era, and Age, in addition to League.

Prospect analytics also took a leap forward with the Player Cohort Success model designed by Josh Weissbock and Cam Lawrence, with some minor input by Jessop and myself. PCS looked at how often players with similar statistical profiles to a prospect succeed in making the NHL for 200+ games. It also looked at the quality of the cohorts’ performance. Davis and Dylan Kirkby made a similar model with pGPS. As did Hayden Speak, now of the LA Kings.

There was some kickback at the idea of measuring 200 games as a success. It is a semi-arbitrary line but the point wasn’t that all 200+ game players were equally valuable. More valuable players tended to have a greater proportion of cohorts successfully passing the 200 game threshold, and that gave a quality measure (and you could just look at the performance of the cohorts).

In fact, using a similar methodology to PCS, I looked at statistical cohorts for Keaton Ellerby and Josh Morrissey to show that upside and safety are directionally related. The short version is that the two groups were drafted at similar spots, but Ellerby’s cohorts were bigger with lower scoring, while Morrissey’s were smaller with higher scoring. Ellerby’s cohorts would be considered “safer” by many because it is commonly viewed that the bigger player is more likely to play a depth role even if they don’t turn into a top-half of the roster player. It turned out to be true; Ellerby’s cohorts peaked as depth or 3rd pairing defenders much more often than Morrissey. However, Morrissey’s cohorts still successfully made the NHL more often despite that. The risk is on average inversed with upside, not correlated to floor.

Both NHLe and PCS model strategies are useful, and can be used synergistically. There are strengths, weaknesses, and biases to each. Looking at both can help mitigate issues and more (good) information is always better.

Yes, even the NHLe models are useful.

See, here’s the thing about scoring. If you simply drafted by sorting CHL skaters by points, with absolutely no other information, you would do reasonably well compared to actual historical NHL teams. This is called the Sham Sharron model test. Sham Sharron wasn’t an indication that you should draft by points, but showing how innefficient the NHL draft since it performed similarly well to NHL teams that watched all those games, conducted all those interviews, and paid all those scouts.

It’s not that scouts are bad. Heck, the Sham Sharron model improved if you adjusted just slightly for qualitative information with including CSS rankings.

Jessop talked about this previously, and I went a bit futher. Jessop looked at CHL defensemen (a position where scoring tends to matter less than forwards), binned them as scorer or not (above or below an arbitrary 0.6 points per game) and draft position (top, mid, and low drafted prospects).

It is interesting looking at the patterns:

  1. Players drafted earlier succeed more often within each scoring group.
  2. Just because you score more, it doesn’t mean you are better. Players who don’t score but are drafted early tend to be more successful than players who do score but are drafted late.
  3. Players who score tend to be more successful than those who don’t despite having the same draft position.

Scouts are finding value with players aside from their point production. However, they are undervaluing a player’s scoring or overvaluing the non-scorring attributes. Essentially, the patterns suggest that low scorers are drafted one entire draft group too early. That’s a significant inefficiency.

There’s a few things to note about said study.

For one, the biases that cause scouts and managers to overvalue some prospects over others will likely persist in the coaches and GMs. That will have a huge impact on whether or not a prospect succeeds in playing in the NHL. This persistance in bias influences models like PCS and to a slightly lesser degree NHLEs. It also means that this study might slightly undersell the actual difference between groups. As an aside, Chace McCallum has done an excellent job showing those very biases.

Secondly, many of the “non-scorers” who succeeded are those who had fairly low scoring for their draft eligible season but scored well the following year. In other words, many of the “non-scoring” defenders who succeeded were not the good shutdown defenders, but scoring-skilled types who posted a bad stat line due to poor luck and/or lack of oppertunity/usage.

Example: Kris Letang was one such “non-scorer” with a 0.46 point per game production, but jumped to 1.13 the very next year. I don’t think anyone would argue Letang was a typical “non-scorer” player.

The reason why NHLe-type models are so important is because scoring matters, a lot. It’s not everything but it’s a relatively large chunk of the picture and it signals most of the players that GMs and scouts tend to make mistakes on. You can’t look at it exclusively, but you cannot ignore it either.

As mentioned earlier, scoring is not like a WAR model. That said, we saw that there is some overlap in the ranking of NHL forwards’ WAR and point productions. It should also be noted that we should expect that overlap to be larger in developmental leagues (and in my small sample studies back in my HockeyData days I found just that).

The skill distribution in lower developmental leagues is much wider than in the NHL. The top players will someday play in the NHL at varying levels, but at the bottom are players that will go no futher. Point production also correlates to usage, which correlates to the coach’s opinion. While imperfect, that gives some qualitative value (and bias) to your model.

Scoring matters but it is not everything, and I leave you with some thoughts for analyzing the draft:

  • Not all 0.7 NHLe players are equal.
  • NHLe is not strong or refined enough to compare a 0.7 vs 0.6 NHLe player with a high degree of confidence.
  • The average 0.7 NHLe player is better than the average 0.3 NHLe player.
  • Not every 0.7 NHLe player is better than every 0.3 NHLe player. The distributions overlap, just not fully.
  • Scouts tend to pick out which 0.7 NHLe players are better than others with some success.
  • Scouts tend to pick which 0.3 NHLe players are better than some 0.7 NHLe players with some success.
  • Scouts tend to overvalue 0.3 NHLe players and undervalue 0.7 NHLe players.

And finally, with all this, I hope more good work goes back into the SEAL NHLe models as it is the natural evolution of the limited, but useful, NHLe Model.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s