Scoring talent influence on goal differentials and statistical double dipping

Screen shot 2014-09-09 at 1.20.24 AM

In August, I wrote an article on how you can translate Corsi differential values in terms of the average expected goal differential given for a players of similar average ice time.

In the article, I used an example of how this information could be used:

For example, Matt Halischuk and Eric Tangradi are two players who averaged 4th line minutes on the Winnipeg Jets. Tangradi finished the season with a 53.9% Corsi, while Halischuk was at 44.0%. Over the span of a season, forwards with those Corsi% would be expected to have on average of -1.04 and a -4.77 goal differential respectively. Therefore, on average, a 53.9% Corsi fourth line forward is worth 3.73 goals more than a 44.0% Corsi forward. Another option is comparing these players to the 46.8% Corsi% of an average fourth line player. The goal differentials can then be used to estimate win values using Pythagorean relationships.

There is a caveat with using raw Corsi% to estimate goal differentials; all effects -such as zone starts- still apply. The estimated goal differentials would be no more predictive than Corsi is; however, you can now easily and more accurately measure Corsi impact in terms of goals and wins.

Now, I used the example of Halischuk and Tangradi for a few reasons. The main one being that they are familiar to me as Winnipeg Jets players. They are two fourth line players that have experienced similar usage but have very polar opposite shot metrics. But, there is another reason… an interesting one.

They are an example of an intriguing situation not discussed often enough in the hockey analytical blogosphere.

Halischuk scored significantly more points per 60 minutes of 5v5 ice time than Tangradi. In the same season as the example above, Halischuk posted an average of 1.28 points per 60 minutes, while Tangradi was under two thirds that with 0.77.

The difference is not unusual for the two fourth line players. Over the past few seasons, Halischuk has maintained a 1.93 P/60 with a sample size of over 1800 minutes. Tangradi, the far superior possession player, has maintained a far less appealing 0.72 P/60 with over 1200 minutes played. Although, their similarity in usage disappears when extending to multiple seasons.

Using the same method as in my earlier Corsi article, fourth line players with a difference in points per sixty as experienced like 2013-14 Halischuk and Tangradi would on average experience a goal differential difference of 3.19 goals per season. Almost exactly same in value, but in the opposite direction as the 3.73 goal value in Corsi.

Now, Halischuk and Tangradi are not the norm. Both players have sat in the opposite extrema for point production and Corsi differentials. They are among the most extreme plausible cases.

So, there is a relationship between scoring more points and outscoring your opponent. That should be obvious, but is ignored often by the analytical community when comparing players Corsi%. However, how strong is the relationship? Is it real? Is it statistically significant?

Here is the R2 for each bucket’s P/60 and GD/60:

Screen shot 2014-09-09 at 9.02.48 AM

The relationship is real, although the R2 seems to be decreasing significantly from top line players to fourth line players. A friendly reminder that a low R2 is expected since difference experienced with goaltending elements are not being accounted for.

This does not diminish the value of Corsi however. There are reasons why Corsi has been placed as the superior metric between the two for evaluating players, even if a combination would be best.

For one, the market value for Corsi talent has been better relative to the market value for finishing talent.  While both a good possession, poor producing player and a good producing, poor possession player can improve a team’s chance to win, the possession players often tend to be significantly undervalued relative to their contributions.

There is also the issue of sustainability. On-ice and personal shooting percentages are large drivers of point production, but are also largely variable from season-to-season. On-ice shooting percentage alone explains a large percentage of the variation seen in points per 60. For the data set of all four lines, the R2 for on-ice Sh% and P/60 sat between 0.5 and 0.6 for each bucket. But, players have difficulty maintaining these percentages. Even with samples as large as three consecutive seasons, it’s been shown you should expect about 67% regression towards the mean for future on-ice Sh%.

Another ignored factor is that what is best for a team is relative to their own needs as well. The LA Kings, one of the league’s top possession teams over the last few seasons, was mocked by some for picking up Marian Gaborik, an aging and injury prone point producer and poor possession driver. However, the Kings never struggled with possession but rather putting the puck in the net. The opposite could be said about the Colorado Avalanche, a team that scored at high rates but struggled in possession.

Finally, there is the “double dipping” factor. For the most part, better hockey players tend to be overall better players. There is a relationship between both puck possession and large sample scoring rates, while this is only a trend relationships and exceptions exist (like the two players above).

We can see this “double dipping” effect when we separate the top line data set (same players in the top graph) further into small buckets. The top line data set was arranged by P/60 and grouped into equal size number of players (each being 5% of the sample). This diminishes the extent that shooting percentage variance affects the relationship between the variables.

Screen shot 2014-09-09 at 11.41.50 AM

This relationship means that if you had a player who had a +1 expected goal differential due to having superior Corsi%, and then also had a +1 expected goal differential due to having superior point production, the true value of the player would not necessarily be cumulative.

Extra: Averages for buckets

This could be considered the third part of a series. I initially took Tyler Dellow’s methods in Corsi and Context (now gone from the internet due to his hiring), and applied it to multiple seasons. I then discussed how these could be used to create expected goal differentials. Now we are here.

I did however update some of my data since the initial article. I removed players with less than ten games played in a single season. I also added offensive zone start deployment and point productions for each bucket:

Screen shot 2014-09-09 at 9.34.16 AM

Click the image to expand.

All the values above are the mean for each bucket with the exception to the goals, assists, and point productions per sixty minutes. Those values are the median for each bucket, since their distributions were very heavily skewed.

Leave a comment