Comparing Scoring Talent with Empirical Bayes

Note: In this piece, I will use the phrases “scoring rate” and “5on5 Primary Points per Hour” interchangeably.


Nico Hischier and Nikita Kucherov are both exceptional talents but suppose, for whatever reason, we want to identify the superior scorer of the two forwards. A good start would involve examining their 5on5 scoring rates in recent seasons…


Evidently, Hischier has managed the higher scoring rate but that doesn’t convey much information without any notion of uncertainty – a significant issue considering that the chunks of evidence are of unequal sizes. It turns out that Kucherov has a sample that is about three times larger (3359 minutes vs. 1077 minutes) and so it is reasonable to expect that his observed scoring rate is likely more indicative of his true scoring talent. However, the degree to which we should feel more comfortable with the data being in Nikita’s favor and how that factors into our comparison of the two players is unclear.

The relationship between evidence accumulation and uncertainty is important to understand when conducting analysis in any sport. David Robinson (2015) encounters a similar dilemma but instead with regards to batting averages in baseball. He presented an interesting solution which involved a Bayesian approach and more specifically, empirical Bayes estimation. This framework is built upon the prior belief that an MLB batter likely possesses near-league-average batting talent and that the odds of him belonging to one of the extreme ends of the talent spectrum are less likely in comparison. As evidence is collected in the form of at-bats, the batter’s talent estimate can then stray from the prior belief if that initial assumption stands in contrast with the evidence.  The more data available for a specific batter, the less weight placed on the prior belief (the prior in this case being that a batter possesses league-average batting talent). Therefore, when data is plentiful, the evidence dominates. The final updated beliefs are summarized by a posterior distribution which can be used to both estimate a player’s true talent level and provide a sense of the level of uncertainty implicit in such an estimate. In the following sections we will walk through the steps involved in applying the same empirical Bayes method employed by Robinson to devise a solution to our Hischier vs. Kucherov problem.

The Likelihood Function – Compiling the Evidence

The framework begins with defining a function that encapsulates the evidence provided by a skater’s recorded scoring history. When utilizing the Bayesian method, the mechanism which accomplishes this task is called the likelihood function. In this case, the likelihood function will divulge the relative likelihood of a skater possessing any specified underlying true scoring rate given the available evidence.

Building the likelihood function requires an understanding of the process through which primary points are produced. More specifically, this entails determining the underlying probability distribution that generates the data. In certain cases, the number of events observed in a given period of time follow a Poisson process. This type of process describes events that meet the following criteria…

  • Events are memoryless: The probability of an event does not depend on the occurrence of a previous event (Ryder, 2004, p. 3)
  • Events are rare: The number of opportunities to produce an event far exceed the number of events that actually occur (Ryder 2004, p. 4)
  • Events are random: Events occur in time with no apparent pattern (Ryder, 2004, p. 4)

Goal scoring in hockey certainly seems to satisfy these properties and so it should not be surprising to learn that goals are approximately Poisson distributed (Ryder, 2004). We are concerned with primary points – the goals scored in which a skater managed to be the scorer or primary passer. For now, let’s assume that primary points are also Poisson. The implication is that if we let θ represent a skater’s true underlying scoring rate per hour, then the probability of that skater managing to tally x primary points in one hour of 5on5 ice time is defined by the Poisson distribution…


The Poisson process can be framed differently. Suppose scoring is Poisson and a skater’s true scoring rate, θ, is known. The probability density or relative likelihood of t hours elapsing between any given two primary points is instead modeled by the exponential distribution…


Whether or not the time between scoring in the NHL is actually generated by an exponential distribution can be tested with the Shapiro-Wilk Test for exponentiality (Shapiro & Wilk, 1972). The null hypothesis is that the data is exponential, and the significance level of the test is 5%. This means that for a skater’s sum of points to be rejected as being generated by an exponential distribution, his points must follow a pattern in time that deviate from exponential expectations to a degree that would occur less than 5% of the time if the data were in fact exponential. Controls are created by generating artificial time distributions from both a true exponential distribution and a chi-squared distribution (1-degree of freedom) for each skater. If our data is exponential, it should mimic test results of the actual exponential distribution and compare favorably to the chi-squared distribution.

Applying the Shapiro-Wilk Test ten times to all skaters who have spent a single season as an NHL regular and have recorded 10 or more primary points over the past 3 seasons produces the following result…


Given that the significance level of the test was 5%, the 5% rejection rate for the exponential distribution is as expected. The skater scoring data has a slightly higher rejection rate of 5.7%, meaning that it is not perfectly exponential, but close. The chi-squared distribution was effectively snuffed out by the test as it was rejected in over half of the randomly generated samples. Separating forwards and defensemen produced similar results. Overall, it is concluded that the time between 5on5 primary points can be approximated well enough by the exponential distribution but further testing is warranted.

Moving forward, the likelihood function for skater scoring rates will be constructed assuming that an exponential distribution generated the data. Every time a skater manages to gain a primary point, the time it took for him to do so is noted. The excess time left between the skater’s most recent primary point and the moment he last got off the ice is simply added to the time it took to score the skater’s first primary point (this is perfectly acceptable thanks to the memoryless property of the exponential distribution). Each one of those lengths of time is a piece of information that says something about the skater’s scoring talent. A player’s collection of primary points and the time required for their production are treated as independent, identically distributed exponential random variables. They are combined into a likelihood function which, as stated previously, unveils the likelihood of a certain underlying scoring rate (θ) given the evidence/data…


Using some basic calculus, we can find the scoring rate with the greatest likelihood conditioned on the available evidence for any given player. For this likelihood function, it turns out that it will always be a player’s observed scoring rate over the entire sample. Due to sample size issues in particular, it is clear that such an estimate would be unsuitable in many cases. The problem is that the likelihood function is initially completely agnostic to any potential scoring rate from 0 to infinity. Of course, there is plenty of historical data that can help identify which scoring rates are feasible over a 3-year period in the NHL. Incorporating that prior information can improve estimates and increase precision. The next section will tackle the process of defining prior beliefs with the aforementioned historical data.

The Prior Distribution – Defining Prior Beliefs

The choice of a prior distribution is completely subjective. This subjectivity goes as far as defining priors rooted in psychological probability, as long as these probabilities are coherent. Empirical Bayes is an approximation to a proper Bayesian hierarchical model and it involves estimating the prior distribution from the data itself. Using the data/evidence to define prior beliefs is certainly contradictory, but the method works as long as each individual data point used to estimate the prior is one of many in a large collection of data points. For example, fitting a prior distribution to the entire population of NHL skaters minimizes the influence of any single skater to the point where the prior can be thought of as roughly independent from each individual in the data set.

The aim is to fit a prior distribution to the scoring rates of forwards and defensemen (60+ games played) over a 3-year span. The data set will be composed of three 3-year blocks encompassing the following seasons… 2009-10 to 2011-12, 2012-13 to 2014-15 and 2015-16 to 2017-18. Scoring rates in each time interval are slightly adjusted so that the means are consistent with the latest of the three intervals. Also, skaters will share the same prior with individuals who play the same position. This will amount to initially assuming that every NHL skater is of league-average ability, with a distinct notion of uncertainty associated with that assumption depending on the skater’s position (forward or defense).

Let’s plot the data…

It is common to use a gamma distribution, a generalisation of the exponential distribution, in cases where the likelihood is exponential/Poisson. The gamma is a convenient choice as it supports only positive values, is very flexible and makes sampling from the posterior distribution (our updated beliefs) extremely simple. The gamma distribution is an appropriate choice for defenders in this case. Forwards, on the other hand, are clearly not gamma-distributed. The forward data instead favors a different generalisation of the exponential – the Weibull distribution. Both distributions are fit using a maximum-likelihood approach with the MASS package in R. The same distributions are shown below but this time they are overlaid with their fitted priors…

A good fit in both cases but it should be noted that typically the choice of prior distribution is further supported by a theoretical understanding of the process underlying the data. Unfortunately, there is very little theory associated with the expected distribution of scoring talent in the NHL that would tempt one to prefer a Weibull rather than a gamma, for example. The approach used in this piece relies solely on the range of supported values and a reasonable fit with the data. This process can lead to over-fitting. However, since the sample of historical data is large, over-fitting the data is less of a concern.

The Posterior Distribution – Consolidating the Evidence and the Prior

The final distribution, the posterior, represents an updated belief about a skater’s true scoring rate and is proportional to the product of the likelihood function multiplied by the prior distribution.

The proportionality preserves the relative likelihood differences between scoring rates. This allows us to combine all terms which do not concern the scoring rate into a constant and effectively ignore them. The final forward posterior density with respect to any given scoring rate (θ) is defined as follows…


*The shape and scale parameters of the prior distribution were earlier determined when fitting the Weibull prior to the historical data set

The posterior provides all the information necessary to deal with the Hischier vs. Kucherov scenario mentioned earlier. For each skater, it associates a relative likelihood with any given scoring rate. For example, a scoring rate with a relative likelihood of 2 is twice as likely as a different rate with a relative likelihood of 1. Of course, there is only one true scoring rate for each player and so the posteriors allow us to sample a single rate for both forwards. By repeatedly sampling rates and keeping track of how many times Hischier rates higher than Kucherov, one can estimate the probability that the former is the superior scorer. Taking 100,000 samples from each posterior (using rejection sampling), it is found that Hischier is the preferred scorer in about 43,000 cases. This implies that that there is only a 43% probability that Nico is the better scorer, despite the fact that he possesses the higher observed scoring rate (2.0 PrimaryP/60 vs. 1.8 PrimaryP/60). We can plot those 100,000 samples and form approximations of each posterior distribution…


Knowing the probability that one skater is the more efficient even strength scorer is significantly more enlightening than simply eye-balling their raw scoring rates and respective sample sizes. We can generate the same plot for any two skaters, as long as they play the same position. Here is Travis Dermott vs. Jake Gardiner…


You can find the code used to compute the posteriors and generate the comparison plots HERE. Instructions on how to use the provided functions to compare skaters are included.


Robinson, D. (2015). Understanding empirical Bayes estimation (using baseball statistics). Retrieved from

Ryder, A. (2004). Poisson Toolbox: a review of the application of the Poisson Distribution in hockey. Retrieved from

Shapiro, S.S. and Wilk, M.B. (1972). An analysis of variance test for the exponential distribution (complete samples). — Technometrics, vol. 14, pp. 355-370.

More applications of empirical Bayes in sports

Alam, M. (2016.). How good is Columbus? A Bayesian approach.

Yurko, R. (2018). Bayesian Baby Steps: Intro with JuJu.







Linking penalties and game minute in the NHL

By Ingrid Rolland and Michael Lopez

At the 10-minute mark of the first period during Game 2 of Tampa Bay’s 2nd-round series with Boston, Torey Krug was sent to the box for two minutes after committing this slashing violation against Brayden Point.


The Lightning cashed in on the ensuing power play, with Point scoring the game’s opening goal.

Fast forward to later in this same game, with Tampa Bay clinging to a 3-2 advantage and less than four minutes remaining in regulation. Brad Marchand skated past Anton Stralman for a scoring chance, and the Lightning defender reached around to commit what looked to be a similar violation to the one deemed a penalty on Krug above.


No penalty on Stralman was called, however, and Tampa Bay held on for a 4-2 win. It was the first of the team’s four consecutive triumphs over the Bruins that earned the Lightning a spot in the Eastern Conference Final.

“We hate to harp on the ref’s, but tonight they deserved to get harped on,” opined NBC’s Jeremy Roenick after the game. “How can you call [Krug’s] penalty early in the game, in such a big playoff game?”


Continue reading

NHL Scoring Trends, 2007-08 to 2017-18: Is the League Getting More Competitive?

Photo by Bobby Schultz, via Wikimedia Commons

Though it was completely tangential to @SteveBurtch’s line of thinking, his brief comments pondering the competitiveness between the middle of NHL lineups yesterday (which I can’t locate now, natch) got me thinking about whether the NHL and team management has gotten any more efficient or competitive overall the last decade. With 10 years in the books for complex Corsi data, and hockey’s seeming “Moneyball moment” fully here regardless of the quibbling on social and mainstream media, is the league getting any tighter?

Continue reading

The Launch of the Tape to Tape Project

Earlier this year, Rushil Ram, Mike Gallimore, and Prashanth Iyer launched Tape to Tape, an online tracking system that can be used to record locations of shot assists, zone exits, and zone entries. Rushil and I will be running the Tape to Tape Project in order to compile a database of these statistics with the application Rushil created. We have already had close to 30 trackers sign up from an announcement on Twitter last week.




Each individual will track zone exits, zone entries, and shot assists for games they sign up for. Once the games are complete, the data will be exported to a public Dropbox folder. The goal with this project is to enhance our understanding of these microstats as they pertain to coaching decisions, player performance, and wins. What follows next is a description of what we will be tracking, a brief summary of the research that describes why these specific microstats are important, and how we will be tracking these events.

Continue reading

An Introduction to NWHL Game Score

In July of 2016, Dom Luszczyszyn released a metric called Game Score.  Based on the baseball stat created by Bill James (and ported to basketball by John Hollinger) the objective of game score is to measure single game player productivity.

While it’s often easy to compare players across larger sample sizes, comparing two different players’ performance on a given night can be difficult. If player A has a goal, two shots, and took a penalty, did that player outperform player B who had two assists and one shot? Game score attempts to answer that question by weighting each of the actions of each player to give us a single number representing their overall performance in that game.

Unlike Dom, whose main goal was to create a better way to evaluate single game performance, mine was to create a better statistic to evaluate the total contributions of players. There are no advanced metrics, like Corsi For percentage, or even Goals For percentage, available at this time in the NWHL. Because of this, points are the best way to evaluate players, even though other box score stats are available.

Continue reading

How Much Do NHL Players Really Make?

Under the provisions of the collective bargaining agreement between the NHL and the NHLPA, a player’s cap hit and the salary they are paid can be two very distinct values in any given year. But even when you understand those differences, how much do NHL player actually take home?

Players’ actual earnings are diminished by a number of factors including escrow, agent fees, and taxes. Agent fees can range from 2-6% depending on representation agreements and services rendered. Tax rates vary throughout the NHL depending on the country, state, and city a team and player reside and play in. But of all the deductions from their income, escrow might be considered the greatest annoyance, as it’s a mechanism to ensure that the owners collect a greater share of hockey-related revenues (HRR) than they have in previous collective bargaining agreements (CBA).

So what is escrow, how much does it actually deplete a player’s salary, and why has it contributed to the tensions between players and owners?

Continue reading

Revisiting Relative Shot Metrics – Part 2

In part 1, I described three “pen and paper” methods for evaluating players based on performance relative to their teammates. As I mentioned, there is some confusion around what differentiates the relative to team (Rel Team) and relative to teammate (Rel TM) methods (it also doesn’t help that we’re dealing with two metrics that have the same name save four letters). I thought it would be worthwhile to compare them in various ways. The following comparisons will help us explore how each one works, what each tells us, and how we can use them (or which we should use). Additionally, I’ll attempt to tie it all together as we look into some of the adjustments I covered at the end of part 1.

A quick note: WOWY is a unique approach, which limits it’s comparative potential in this regard. As a result, I won’t be evaluating/comparing the WOWY method further. However, we’ll dive into some WOWYs to explore the Rel TM metric a bit later.

Rel Team vs. Rel TM

Note: For the rest of the article, the “low TOI” adjustment will be included in the Rel TM calculation. Additionally, “unadjusted” and “adjusted” will indicate if the team adjustment is implemented. All data used from here on is from the past ten seasons (’07-08 through ’16-17), is even-strength, and includes only qualified skaters (minimum of 336 minutes for Forwards and 429 minutes for Defensemen per season as estimated by the top 390 F and 210 D per season over this timeframe).

Below, I plotted Rel Team against both the adjusted and unadjusted Rel TM numbers. I have shaded the points based on each skater’s team’s EV Corsi differential in the games that skater played in:

relattive_cow_comp Continue reading

Revisiting Relative Shot Metrics – Part 1

Relative shot metrics have been around for years. I realized this past summer, however, that I didn’t really know what differentiated them, and attempting to implement or use a metric that you don’t fully understand can be problematic. They’ve been available pretty much anywhere you could find hockey numbers forever and have often been regarded as the “best” version of whatever metric they were used for to evaluate skaters (Corsi/Fenwick/Expected Goals). So I took it upon myself to gain a better understanding of what they are and how they work. In part 1, I’ll summarize the various types of relative shot metrics and show how each is calculated. I’ll be focusing on relative to team, WOWY (with or without you), and the relative to teammate methods.

A Brief Summary

All relative shot metrics whether it be WOWY, relative to team (Rel Team), or relative to teammate (Rel TM) are essentially trying to answer the same question: how well did any given player perform relative to that player’s teammates? Let’s briefly discuss the idea behind this question and why it was asked in the first place. Corsi, and its usual form of on-ice Corsi For % (abbreviated CF%) is easily the most recognizable statistic outside of the standard NHL provided boxscore metrics. A player’s on-ice CF% accounts for all shots taken and allowed (Corsi For / (Corsi For + Corsi Against)) when that player was on the ice (if you’re unfamiliar please check out this explainer from JenLC). While this may be useful for some cursory or high-level analysis, it does not account for a player’s team or a player’s teammates.

Continue reading

Women’s Olympic Hockey Predictions

It’s the Olympics again, which means it’s time for everyone’s favorite activity: watching Canada underperform at ice-hockey! And while Hilary Knight breaking the hearts of Canadians is fun for everybody, the only thing that’s more fun is watching Hilary Knight break the hearts of Canadians while you have a statistical model that predicts each team’s likelihood of winning a medal! That’s right, Hockey Graphs is taking on the challenge of predicting the Women’s Olympic Hockey Tournament results.[1]

Continue reading