There are few questions in hockey analytics more fundamental than who played well. Consequently, a large portion of hockey analysis has been focused on how to best measure results. This work is some of the most well-known work in “fancy stats”; when evaluating players and teams, many people who used to look at goals scored moved to focusing on Corsi and then expected goals (xG).
The concept of an xG model is simple: look at the results of past shots to predict whether or not a particular shot will become a goal. Then credit the player who took the shot with that “expected” likelihood of scoring on that shot, regardless of whether or not it went in. Several such models have been developed, including by Emmanuel Perry, Evolving Wild, Moneypuck, and many others.
However, there remains additional room for improving these models. They do impressive work based on the available play-by-play (pbp) data, but that only captures so much. There are big gaps in information, and we know that filling them would make us better at predicting goals.
Perhaps the biggest gap is pre-shot movement. We know that passes before a shot affect the quality of the scoring chance, but the pbp data does not include them. Thankfully, Corey Sznajder’s data does. While it does not cover every single shot over multiple seasons, it is a substantial dataset; when I pulled the data for this model, it had roughly half of the 2016-2017 and 2017-2018 seasons included: 72 thousand shots from 1,085 games. While the number of games tracked varies by team, we have at least 43 for every team except Vegas, for which we have 26. We can use this data to build the first public xG model that incorporates passes.
Continue reading →