For quite some time there has been a debate going on: those who think you should add a defenseman’s effect on save percentage into player evaluations and those who think that adding such information causes more harm than good to the analysis. Note that this does not mean defensemen do not affect save percentage. That is an entirely different stance.
When it comes to evaluating a player statistically, you want the number to account for two things: effect and control. If a statistic does not help quantify how a player improves their team’s chance at winning, it is useless in measuring effect. If a statistic has too much white noise or other contributing factors that it would take too large of a sample to become significant to the player’s contribution, it is useless in measuring a player’s control over the effect.
Repeatability is how you determine control. If an outcome is generally considered good but players struggle to be consistently good or bad at it, then there is not much control over the number. Take for instance scoring; the best players at scoring tend to stay at the top of the league, while the worst tend to stay at the bottom. While there are exceptions to this, the general trend exists.
A game of hoops against your friend is a common analogy used with how repeatability demonstrates control. If your buddy pulls of a trick shot and you think it was a fluke, you ask him to try it again.
The image above is the statistical equivalency to asking your friend to repeat that trick shot. This is 575 samples of a defenseman’s relative save percentage versus the next two seasons thereafter. Two seasons were used to help diminish the variance voodoo that is goaltender save percentage.
Ultimately, defenders fail in being able to sustainably improve or worsen their goaltender’s save percentage. The model only explains 2.6% of the outputs.
Coefficients of determination are not naturally intuitive to most people. So, why is the model’s weak R2 an issue? Let’s look at the same data in a different way.
All player seasons were ordered from the lowest to highest relative save percentage and then split into groups of 58-59. The relative save percentage was then averaged between the players from each group for their first two seasons and then again for the next two. This was a strict average of relative percentages, although a more thorough analysis would use the relative save percentage of each group as a single entity.
The 10% of players with the most extreme negative impact in their first two seasons end up having no impact on average over the next two seasons. In contrast the 10% of players with the most extreme positive impact in their first two seasons end up with a positive of about +0.005, less than 20% of the impact the previous two seasons.
David Johnson recently responded to some criticism of the low repeatability of relative save percentage by indicating (rightfully) that hockey is a team sport and so there are multiple variables that affect all numbers. He added that players have an impact and therefore it is important. The problem with this is analytically having an impact is not enough. The impact must be significant enough that you have over a certain level of confidence in the number.
When you evaluate a player statistically you cannot extol or condemn a player for something that could equally be their fault as it could be a number of other factors.
There are three consequences to this discovery. One, as already discussed, that defensemen’s control of save percentage is minimal enough that adding on-ice save percentage effects do not add value in player evaluations other than looking at those likely to regress to the mean. Two, that a defender’s effect on shot metrics (specifically looking at both Corsi% and relCorsi%) is still the best way to approximate a player’s value. Three, that for the most part a goaltender owns their save percentage (once sample is large enough).