An early look into some of the new numbers available

From Wikipedia Commons: A graph showing the minimum value of Pearson’s correlation coefficient that is significantly different from zero at the 0.05 level, for a given sample size.

There are two new and very exciting frontiers being explored by the hockey analytics blogosphere. There is the manual tracking of zonal statistics, such as zone entries and exits. This area of research was first pioneered by Eric Tulsky and Corey Sznajder. Then there is the splicing of Corsi into microstates, such as looking at shot attempt differentials momentarily after face off wins or loses in particular zones. The early workers on these numbers were Tyler Delow and Muneeb Alam.
(side note: it should not be a surprise that one of each group was recently picked up by a NHL team this summer)

I recently was able to get data from the non-NHL hires named above (and will enjoy their contact while I can until they are picked up too). Sznajder provided me with zone entry and exit data for just over 60% of the NHL. If you would like to check out his project and contribute, check this link. Alam sent over shot attempt events 10 seconds after a defensive zone face off, which was further separated into wins and losses.

I originally received this data for study of the Jets and noticed what appeared to the eye to be a relationship, and wished to delve in further.

I noticed that with the Jets, the defenders who had better Zone Exit Success% (successful zone exits with possession per puck touch), tended to have better DZ FOW Corsi% (shot attempt differential after a defensive zone face off win). This seems intuitive with the often paraded ideal that the best defense is to get the puck out of your zone and possess the puck.

So, I decided to look at the relationship for all defenders in 2013-14.

Figure 1:

Screen shot 2014-09-01 at 10.14.47 PM

Right away we can see that the relationship is not very strong. The Pearson Correlation Value (r) comes out to 0.227. This is not an overly large covariate. The Correlation Coefficient is not much better at 0.05; however, there is sometimes a false assumption made by the hockey analytical community that a “weak relationship” is equitable to no relationship. The title graph above shows how the significance of a non-zero relationship is dependent on the sample size. In addition, it is also dependent on the nature of what is being recorded. Hockey, by its very nature, has more variables than just a single person’s input; there are the nine other players on the ice and systems that all affect the results as well.

With a sample value (n) of 220, there is over a 99% statistical probability that the true relationship between the variables is a non-zero value.

It is early, but there does seem to be a relationship, even if small. How significant the true relationship is though will need far more in-depth and further studying than this.

For the more scientifically inclined:

There are some large possible sources of error in this quick study.

For one, the two variables are not accounting for all the same games. The Zone Exit Success% accounts only for just over 60% of the 2013-14 NHL season, while DZ FOW Corsi% accounts for the full season. 

Another possible source is the very small sample size nature of DZ FOW Corsi%. The mean and median observed shot attempts for after a DZ FOW was 4.32 and 4. With such a small sample size, a single shot attempt for could wildly swing the overall Corsi% value. (For those wondering, the mean shot attempt against value was 11.7).

The good news is that, hopefully, in time we will have access to a far larger data-set, allowing to mitigate these issues. If the NHL ever goes live (and public) with something like SportsVu, then we will really start to see some gold in these kind of studies.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s