Friday Quick Graphs: Shooting and Playmaking Contributions, 1967-68 through 2012-13

I’ve just finished a pretty massive dataset, so I’m geeking out a bit over what I can do with it. Just the beginning, above…this is the distribution of %TSh (player shots divided by estimated team shots in games they played) and %TA (same equation, but with assists) season performances, 20+ GP, from 1967-68 through 2012-13. Per recent arguments about Ovechkin, I’ve added lines showing where his best season (2008-09) and most recent full season (2012-13) fall on the list; his current season would fall approximately in the same place as last season.

Those of you who’ve been following me on Twitter know that I’ve put together a pretty substantial dataset, and I’ve been working through the data with a metric I’ve used for a while. %TSh is a player’s shots divided by his team’s estimated shot total in games they played (Team Shots / Team GP, multiplied by player GP). The measure gives us an idea of the player’s shooting contribution to the team’s offense. It moves outside the pesky variance of shooting percentage and gets closer to a stable indicator of offensive role. I’ve done the same with %TA, which is the same equation for assists. The reason for estimated team totals is we don’t yet have good macro-data on specific games that players played before 1987-88, but the metric runs essentially in lock-step with the real thing and I want to provide a useful, historical point of comparison. Doing this allows us to look 20 years further back.

The distribution above includes over 23,000 player seasons over 20 GP; the orange distribution is %TA, and black is %TSh. I used the marks to connect back to the previous week’s bizarre flame war over Ovechkin’s value and approach to the game; the top one shows Ovechkin’s peak year, 2008-09 (20%), which also happens to be the highest %TSh of all-time. The bottom mark is Ovechkin’s 2012-13 (16.3%), which I’m using because his current season is just slightly higher – it would be good for 16th best in NHL history.

I also did a second graph, wanting to look at the relationship of %TSh to %TA, to see just how much they ran together:

Related to the previous post, I decided to see if the relationship between TSh% and %TA was too close to tell me anything. %TSh is on the x-axis, and %TA is on the y. As you can see, they do run together, which is okay, because rebounds can result in assists for the shooter, and players with a lot of shots will generally be engaged in the offense in all ways. That being said, it’s not so close that they aren’t distinctive. The plot above does look scattered enough for these two metrics to tell us something apart from one another.

In the graph above, the x-axis is %TSh, and the y-axis %TA. Intuitively, these run together a fair amount, as shots create rebounds that can be counted as assists, and a player that shoots a lot is likely to be more heavily involved in the entire offense. That said, they don’t run nearly so close together as to render either measure moot. I think %TA can be a valuable counter-weight for assessing defensemen. Anyway, this is the tip of an enormous iceberg of data, so don’t be surprised to see me refer to and use %TSh and %TA again.

Input versus Output: An Ongoing Battle that No One Knows About

XKCD comics is written by Randall Munroe, a physicist who probably doesn’t know what  hockey underlying numbers (ie: #fancystats or advance statistics) even are, let alone supports them… yet – for the most part – he gets it.

Mainstream sports commentary is full of poor analysis when it comes to using numbers appropriately. Most of this comes from a lack of understanding between the difference between inputs versus outputs and how much a player can control certain factors. (It should be noted that this is a broad generalization; not everyone falls into this category).

Benjamin Wendorf displayed a bit of these factoids in his recent article Why The Hockey News’ Ken Campbell is Wrong About Alex Ovechkin, but Campbell still didn’t get it.

What happened:

For those that do not know, here is a quick summary of Campbell’s article:
Continue reading

Outperforming PDO: Mirages and Oases in the NHL

Above is the progressive stabilization (game-by-game, cumulatively) of all-situations PDO over time for the 30 NHL teams. It’s a demonstration of the pull of PDO towards the average (1000, or the addition of team SV% and shooting percentage with decimals removed), and it gives you a sense of the end game: an actual spread of PDO, from roughly 975 to roughly 1025. In other words, if you were just to use this data, you could probably conclude that it’s not outside expectations for a team to outperform 1000 by about 25 (or 2.5%) on either side.

That’s all well and good, but PDO is a breakdown of two very different things, a team’s shooting and goaltending, two variables that understandably have very little to do with each other (they are slightly related because rink counting bias usually affects both). Shooting percentage can hinge on a number of contextual variables, though its reliance on a team’s player population usually can bring it a bit in-line with league averages. Save percentage, on the other hand, hinges on one player, and what’s more past performances suggest that a single goaltender can quite significantly outperform expectations. In this piece, I want to jump into the sliding variables of PDO, and what we can expect from teams, but first I want to begin with why I’m working with all-situations PDO.

Continue reading

Forecasting Future Goalie Performance with Four Year Hockey Marcels:

Evaluating goalies is hard.  Goalie performance varies more than anything else in hockey and today’s terrible goalie can randomly turn into an elite goalie next season….and then turn back into a terrible goalie.  The best measure we have for evaluating goalies is Save Percentage and so we often tend to use a player’s career SV% as a way of forecasting what to expect from a goalie in the future.

However, it would make more sense not just to take a goalie’s career average SV% when forecasting future performance, but rather to take a weighted average in which we place greater importance on more recent data.  Eric Tulsky recently did this at his must-read blog, Outnumbered, and looked at what weight he should give each recent year’s data to forecast the next three years of a goalie’s performance:

So in my base case, I’m using years 1-4 to try to predict years 5-7. The best predictions came from weighting things like this:

  • Each shot faced in year 3 counts 60 percent as much as shots in year 4
  • Each shot faced in year 2 counts 50 percent as much as shots in year 4
  • Each shot faced in year 1 counts 30 percent as much as shots in year 4

This is particularly similar to the baseball forecasting system invented by Tom Tango, known as the Marcel Forecasting System.  Marcel, named after the monkey, is one of the most basic projection systems possible – it simply weights each of the last three years with weights of 5/4/3, adds a very basic regression to the mean, then adds a very basic aging projection.  Marcel is very basic on purpose – it’s still pretty damn accurate, and if a more complicated forecasting system can’t beat Marcel in baseball, it’s useless.  Surprisingly, most forecasting systems don’t improve upon Marcel by very much.
Continue reading