This is the first part of a five part series. Check out Part 2, Part 3, Part 4, Part 5 here. You can view the series both at Hockey-Graphs.com and APHockey.net.
The 2015-2016 NHL season is almost here, and our sport has come upon a new phase — arguably the third — in its analytics progression.
The first stage was about broad ideas and testing; I’ll call it “the Discovery Phase.” It involved public minds brainstorming large-scale ideas about the conventional truisms of the game, looking to prove and disprove that which many had taken for granted. It lent us ideas like the undervaluing of small players and terms like Corsi and PDO. It was revolutionary but not yet a revolution.
The second phase was “the Recognition Phase,” which was kicked off by the Summer of Analytics. Teams began to buy into public work as worthy of investment and began to question their own practices. Now, as we saw it in baseball, a third phase is emerging. One in which much of the public is willing to accept the initially-controversial public ideas, but in which analysts are pushing back on generalities in situations that are often team and player dependent.
We are now in a phase where analysts take a magnifying glass to every claim being made. For example, there is no more argument about whether or not Corsi is relevant or important — at least not among those in positions of influence. The question is in what cases it works best, and maybe more importantly, where and why it fails. Because it does, after all. There are players whose finishing abilities, defensive prowess, special teams impact and leadership mean that the value Corsi presents is significantly off base. And it’s important in a billion-dollar industry to figure out how to account for that. The same can be said for any of the metrics that came out of the Discovery Phase or that continue to be developed today.
The point of all this is that we’re at a point where you no longer dismiss the exceptions; you dig into them. There is a lot in the world that can be explained by simple variance, but the game of hockey is far too complicated to assign anything that doesn’t fit a successful model as such.
And that is where this series comes in. I’ve written before about the importance of learning the lessons of the sports that have come before us. Baseball is well into its analytics era, with massive teams of analysts and interns crunching numbers to maximize efficiency without sacrificing projected image. Baseball came to its Magnifying Glass stage years ago, and some of its most important new concepts were examined with more scrutiny. Is on base percentage really the best measure of a player’s hitting ability? Is defense really overrated? Are there some pitchers that can control a hitter’s batting average on balls in play?
One of the biggest early analytics ideas in baseball was the Pythagorean Expectation as a way of evaluating whether a team’s record was more a result of true talent or of luck. Bill James found that through the equation runs squared divided by runs squared plus runs against squared, one could get a good approximation of a team’s true talent winning percentage. If a team’s actual winning percentage was significantly above or below expectation at that point, the thinking was that the team would likely be in for some regression, in one direction or another.
In much the same way as hockey analysts used poor Corsi numbers to predict the downfall of teams like the 2012 Minnesota Wild, the 2014 Toronto Maple Leafs, and the 2015 Colorado Avalanche, baseball analysts used run differentials to out their biggest pretenders, such as the 2005 Washington Nationals. The Toronto Blue Jays this season, even prior to acquiring Troy Tulowitzky and David Price, had the best run differential in the league but a .500 record. Thanks to both regression and acquisition, that has changed fast.
Over the course of this series, I will look at some of the work that’s been done on Pythagorean expectation in hockey in the past and look at a few different ways of adapting it to fit the 2015-2016 NHL landscape. But considering the phase we’re in, I will also investigate whether it might contain some flaws. Are there teams that can systematically outperform their expectations? And if so, what could be behind that?
The first stage is to examine the standings as a whole. I chose to look at data since the lost season — since calling it “the lockout” no longer works — and changed all shootout games into ties, awarding one point to each team. To our statistical knowledge, shootouts appear to be largely coin flips, so I chose to eliminate them for this analysis. Here you will find the adjusted NHL standings since 2005-2006. It is sorted by aPTS, which is the adjusted point total that each team has when removing shootouts. I am also posting the top and bottom 10 teams by aPTS over the course of the decade. Take a look and think about which teams you believe might have overachieved or underachieved, and how does that relate to their goal differentials or records in one-goal games? Feel free to comment below or on Twitter. Part two will be out tomorrow.