Semin, the pinup girl and eternal darling of the Corsi asylum, is the prime example of how utterly BS advanced stats are in hockey.
— Slava Malamud (@SlavaMalamud) July 24, 2015
It happened again.
Someone said something against Corsi; maybe even in jest with some slight sarcasm. The masses counter attacked. Then two sides started to fire shots at the other.
It is not the first time, and it won’t be the last time.
In some ways, I understand and agree with the sentiment of letting things just go. Move on. Often we are only preaching to the choir.
Still, it’s an opportunity to teach and use examples of misconceptions, and also warn against potential issues. There are some thirsty to know more.
This is not the first time I’ve written in response to misconceptions, but alas here we are.
Let’s take a look.
Corsi is not a holistic statistic, nor tries to be one
This is something that gets confused a lot.
Corsi, or any other shot metric, is not an attempt to value a player’s full contribution on the ice. Nor is it supposed to be a perfect measure of a team’s ability.
It is however an accumulation of many of the events that are byproducts of players and teams who are doing the right things in helping their team win games. It’s why shot attempt differentials predict future goal differentials better than past goal differentials.
The analytical story does not end at Corsi. Someone being a good Corsi player is helping their team in specific areas of the game and tilting the ice, but this does not guarantee they are a better overall player than someone with a weaker Corsi differential.
So no, Corsi has never told you who the best player is in the league or team. It does tell you who is the best Corsi player though, which is informative and helpful but not everything.
And here’s the thing: every person who uses Corsi knows this (at least that I know of). The people who do not are within the anti-Corsi crowd.
Corsi is imperfect
This may be surprising to some that I would say this but no one who uses Corsi regularly does not believe that statement.
Like we just discussed, Corsi is useful and even important but does not cover all facets of the game.
There are also flaws to Corsi and we as a community have regularly tried to work on them for years.
Out there you can find variations that try to account for usage, or adjusted for factors like the score or arena. There’s regression based versions and shot location adjusted numbers.
These variants are still based on the same data and concepts. It’s an evolution of Corsi.
When I work for teams or players through CKM Management, I use a variant of Corsi I designed that’s specifically optimized to the 10 game samples we sell. It’s an improvement for that situation.
Everyone knows Corsi is imperfect. The ones that use it know that too. They also know its uses, and that it’s quick and easy accessibility help counteract its flaws. They at least know how deep the flaws go, somewhat.
The online public community are well aware of the flaws of Corsi when they use it, but still use it for a reason because it is still meaningful.
It’s the people like Slava Malamud (or whom he is impersonating) and Mike Kelly who do not.
A fun aside on proprietary research and data
After Malamud went on his Twitter rant full of misrepresentation and straw man arguments on Corsi (perhaps intentionally as James Mirtle noted), analyst Mike Kelly made an interesting reply.
It’s not as if I’m against proprietary work. As I noted previously, I even have my own that I use for my work with teams and players.
There are pros and cons to both.
The pro to proprietary work is that you protect your own work for sale and use. It’s your competitive edge in the market.
The con is, with no one able to test your black box, no one knows whether what you are saying is true or not.
And here is the funny part: how does Kelly know what’s specifically being used and whether or not it’s better?
He may have been told by some teams or analytical companies that they use something else and they believe it to be better, but that’s probably about it.
It’s a “my daddy can beat up your daddy” like statement.
No one can really verify it as true. No one knows how close it would be. And no one knows if the proprietary work is simply a variant of shot attempt differentials.
There’s also the fact that bad analysis and decision making can still come from better data.
Once again we have someone dismissing something they don’t fully understand.
Malamud doesn’t seem to understand what Corsi says about Alexander Semin, let alone how it is used (at least in the way he represented it).
It’s tempting to backlash but there’s not much you can teach to someone who doesn’t want to learn.
There are people who do wish to know more though and that is the target audience for this article.
And no Mike Kelly, I don’t think teams are letting you into their black boxes to figure out who has the best stuff. Especially the nearly half the league that hired the basement bloggers that you used to mock.