Hockey Talk is a (not quite) weekly series where you will get to view the dialogue among a few Hockey-Graphs contributors on a particular subject, with some fun tangents.
This week we started from a Twitter conversation suggesting that expected goals calculations (xG) might underweight “shot quality”. A topic that HG contributors are hardly short of opinions on.
dtmaboutheart: Follow @DTMAboutHeart
If there is no good way to measure shot quality then how can anyone test if xG is underweighting it?
mattcane: Follow @Cane_Matt
xG is literally the most direct way to measure shot quality that exists, given the current data. Unless there is a way to measure the information about where the puck was moving prior to the shot, you’re not going to squeeze anything more out of the current data.
nmercad: Follow @NMercad
People talk about shot quality but they don’t really understand it. What it is. What it means. Where and when it is appropriate or worthwhile to consider in analysis.
Chasing shot quality is chasing the dragon.
No coach is going to listen to anyone who says shot quality doesn’t matter. They also aren’t going to listen to someone who says it matters but can’t show how.
If they can’t show how, a coach, a hockey person, will use their own empirical evidence. Your best bet is still to show a coach how in a large sample its impact gets largely washed out.
So don’t focus too much on it in the grand scheme of things. Focus on other things that will help produce more sustained offense. Then let your players create out of that. Use intuitive solutions for quality. Like lateral passes and quick touches prior to scoring plays. Mess with goalie angle lines. Use traffic against your opponent. Etc.
bwendorf: Follow @BenjaminWendorf
Honestly, even with richer data, the improvements to existing shot quality models are going to be small increments. There are no leaps to be had, because the finer you slice the data the more troubling your samples are, and you can’t really escape how heavily shooting regresses.
Which is more a plea to quit with the idea that richer data will transform what we already know. The value in richer data will be in getting more answers related to the “how,” and making analytics more prescriptive.
rjessop: Follow @Thats_Offside
I think if we keep trying to answer whether shot quality exists, we’re gonna keep missing the point because we haven’t framed the question right.
We know shot quality exists. Shootout shooting percentage is higher than PP shooting percentage is higher than 5v5 shooting percentage. It is completely, abundantly obvious that shot quality exists.
The question to ask then is: “why does shot quality have so little relative impact on long-term results?”
My working hypothesis would be the impact of contemporary coaching basically creates an environment where generating shot quality is hard. Everyone sections off the middle of the ice, keeps wingers low, takes away the slot, boxes guys out, has sticks in lanes, etc. So if everyone’s doing this, it’s going to be harder and harder to get to the areas where “quality shots” are generated, so they’re going to be scarce. Also, if everyone is similarly successful playing their system, relative differences are going to be tiny.
If this is right (and it may not be but I have no way to test it either way), we would also be completely wrong to tell a coach that he shouldn’t worry about shot quality. Because it matters a TON.
The entire ecosystem in which we analyze is based on a leveled playing field in terms of shot quality due to contemporary systems. Ignore that, and the sort of shot quality equilibrium that’s been established falls apart, relative differences grow, and you see bigger effects.
The message then isn’t: “ignore quality, focus on quantity.” It’s: “we need to find ways to boost the number of pucks directed the right way within the framework of what we have here.”
Of course, this is all theory and conjecture. Makes sense to me though.
bwendorf: I think coaching systems are attempted workarounds to the general facts of play, instincts, and reflexes in hockey. Which is to say, it is always going to be harder to get shots from closer in, and easier to get them from further out.
jackhan: Follow @ml_han
That’s about right, Rhys. The NHL is hockey being played by generally comparable players, within generally comparable systems.
At the fringes you’ll find the Vaneks and Pat Kanes and the Bobby Ryans. But you can see why just by watching them. In women’s hockey you’ll find more players who really drive on-ice shooting %, but in most cases they drive volume too. Microstats gives you a pretty good ide of why that is.
Guys like Desharnais, Tanguay and (to a lesser extent) Hudler shoot for higher % because they’re more selective. In fact it might be because they’re less confident in their shooting skills. Either they pass off or they try to get closer. Whereas Pacioretty, Kessel and Ovi will be a bit more ambitious and take the shot even if it’s not a gimme.
petbugs: Follow @petbugs13
And which set of players scores more goals? That right there should say everything about chasing shot quality.
Part of the problem I find in the shot quality discussion is the tendency to conflate shot context with shooting skill. It appears that people think they’re talking about shooting skill when in fact it’s usually the context that’s actually producing the observations they’re trying to explain.
The contextual qualities of the shot, like distance, angle, puck movement, screening, whether it’s a rebound, etc. are much more a product of collective offensive system and player creativity than of individual shooter skill. Shooters impact aim and velocity.
That split between the contextual factors, which again are a product of the offensive and defensive systems and the execution of all 10 skaters on the ice, and the shooting skill is what keeps getting left out of the discussion. So trying to find or show the repeatability of the skill component of shooting percentage is a Sysiphean task.
We already know that the impact of shot quality (context + skill) is miniscule in comparison to other factors, but to then try and split it out even further is going to be incredibly complex. Especially when it will be nigh on impossible to control for the myriad of factors that go into the contextual component.
And to be honest, I still have this sneaking suspicion that defensive system and execution has the largest impact on the contextual component. If true, this would mean that what we think of as shot quality is actually a function of the defense and not the shooter. So no wonder it’s difficult to show the repeatability.
jackhan: Still a concept worth teaching. Like faceoffs are worth teaching, even they don’t really drive results. You still want to be adequately prepared.
Simple postgame talking point we had during a video session: outside the dots off the rush, a forward shoots about 2%. Cut inside the dots and you can expect to shoot closer to 8-10%. Much more efficient to take 1 shot instead of 4, so cut in if you can.
Also, in my experience, shooting % for dmen is mostly related to how well they short the zone, not how hard they shoot.
petbugs: And that is exactly the kind of actionable insight the analytics community should be identifying. Chasing the holy grail of shooting skill is a fool’s errand.
bwendorf: I’ll leave it open for someone to say otherwise, but I don’t think anyone here would say shot quality doesn’t exist. I only make a point to say that because I still feel like there are people that believe we say that.
petbugs: Agreed. I think that was something that was said years ago, early on in this argument. But it was quickly amended to “it exists, but the impact is small so you can ignore it for the most part”.
It was an off-the-cuff, imperfect reaction to the suggestion that shot quality was what really matters. But it quickly evolved to the realization that yes, it does exist; it’s just rather inconsequential and difficult, if not impossible, to isolate. So why spend much effort trying to do so?
28 thoughts on “Hockey Talk: Shot Quality”
I don’t really get why you say that with better data, the improvements will be small. This was the exact argument that people used when they said shot quality basically didn’t exist.
Once the data is available, it either is better or it is not. Richer data is always better to have because it either improves your predictions, or lets you know that your predictions can’t be improved by something that should be measurable.
But I have a feeling that, based off of Steve Valiquette and Chris Boyle’s work, it is going to give us better predictions.
Chris Boyle makes up his results. Don’t hitch your wagon to his nonsense.
Well, that’s quite the accusation, but I wasn’t hitching any kind of wagon
An NHL crease is 6 ft deep. According to Boyle Jon Quick plays on average 10 feet out of his net. Or 4 feet out of his crease. Or 2/3 of another crease length. Which puts him roughly in the low slot for most of his saves.
His data is pulled from “70,000+ shots over the last 4-5 years isolating pre-shot movement.” But he never qualifies methodology or even makes this point known in his articles. Considering there are roughly 78,000 shots per season, the sample is limited. That isn’t an issue in itself (it is manual tracking after all and time consuming as hell if he is doing it). However, when the data is presented as in-season, when it is in fact past year data, and/or inconsistently sized samples across goalies (he tracks primarily Lundqvist and Price along w who they play), it leads me to the conclusion that some of what he is presenting is made up. If it isn’t, it is lacking peer-review and reliable methodology. Pick your poison.
Okay, that’s fine. It’s not the 100% accuracy of the numbers that are specifically important here. It’s the premise. That deflections, one timers, back door passes, breakaways, etc etc are more likely to result in goals. And that isn’t available in the current data. And it shouldn’t be dismissed. The improvements made with this data could be small, or it could be quite large.
The returns on subsequent shot quality models have been diminishing rapidly, for the very reasons I stated in the conversation. Shot quality models necessarily have to be hitched to scoring and its regressions (both in scoring and from regression and repeatability testing of shot quality measures). The longer you reach chronologically to help with sampling issues, the more likely you aren’t working in the same context (teammates, teammate/player age curves, etc.) — and the more you model it, the more abstract and less actionable it can become. To make matters worse, there is a lot of subjectivity, audacious claims, and hidden measures in shot quality research…so it’s important to see the actual work and data before jumping to conclusions about what might have been uncovered. Keep in mind some incredible efforts and research have been done in the past, by some incredible minds, and it all keeps coming back to the limitations I’m noting above.
For what it’s worth, all of us came to this because we liked hockey, statistics, and doing research pieces, so far be it from me to say somebody shouldn’t research shot quality if they want. But the limitations are more statistical than they are based on feel or conjecture, to the point that my interests are focused towards places where there is a lot more to uncover (“how” possession is achieved productively, for instance, and how team shot generation and player contributions have evolved over the last 50 years).
The additions to subsequent shot quality models have been diminishing rapidly because the data simply isn’t there. But you seem to be dismissive of even attempting to use any new data. Like adding the tracking data that NHL teams are trying to obtain, and that NBA teams use so well. I don’t see why you wouldn’t want to eventually use that data to see what kind of value it can add. I mean, I’m sure there is a reason that Crosby’s on ice shooting percentage was 2.5 standard deviations above average over 8 years.
Based on Travis Yost’s article (http://www.tsn.ca/examining-on-ice-shooting-percentage-by-position-1.338499), R^2 for centremen for on ice shooting percentage is .4043. Using his exact parameters, R^2 for pts/60 is .5239. For wingers and on ice shooting %, r^2 is .2899. For pts/60 it’s .389. Those are relatively small differences.
Sure, there is far more to uncover in neutral zone play etc. And I am having a great time reading about it in various places and discussing it. But that doesn’t mean that advances in other areas should be dismissed.
As long as shot quality pivots on a hard-regressing result like goal-scoring – which it must because, so I’m told, that’s the point – you are, statistically speaking, looking at a sliver of repeatable performance that has necessarily baked within it player movement, player shooting talent, player shooting location, and the same variables for all the player’s teammate combinations. Others also wish to bake in player movement into that slice as well, which is fine. But all those variables are fighting over a sliver of repeatable performance, bound by the limitations of goals as your result-orient. Unless you were willing to accept a different result, with richer data – as we have done with shot attempts, and you might with some other tracked variable in the future, though I don’t know what – you are going to be fighting the same battle with the data, not my dismissal. Me, personally, I’d rather focus on prescriptive solutions for richer result data.
But a players pts/60 is based around goals as well. So is it that you are saying that good players simply drive better on ice shooting %, some of it has already been explained, and there isn’t a whole lot left?
Won’t the same problem likely be run into pretty quickly with possession metrics?
Good players benefit from a confluence of their own contribution to on-ice shooting, and their tendency to play with better teammates, and sliding variables of all the other things I mentioned before, all of which factor more or less into whatever observed talent there might be after you account for regression. This is what happens when you talk about 150 or so events in a season that regress heavily. Shots do not regress the same amount, and provide roughly ten times the data points. And because they don’t regress as heavily, while they rely on contextual factors just as goals do, they make for better points of comparison.
We aren’t talking about a small sample over 150 in one season though. The r^2 of over .4 was found with shots comparing two samples of over 1000 shots. Ovechkin leads the sample with just barely shy of 5000 shots over the two samples. No one has fewer than 1000 shots.
There are many things, like defensive metrics in baseball, that have very low correlation year over year. But as the sample gets larger, like around 3 years depending on position, the statistic becomes reliable and repeatable.
I mean, for centremen based on that methodology, the on ice shooting % r^2 is only .0006 lower than it is for CF%. So on ice shooting percentage seems just as repeatable as shot attempts over a large sample, which seems to buck what you are saying
The shots aren’t the sample, because you’re using the goals to establish talent. Now if you were doing expected goals, like DTM, you would be grounding the metric in the shot samples and using regressed adjustments based on goal-scoring.
Over larger samples, the on-ice shooting becomes more repeatable because it has mostly regressed to the league-average. At which point it has lost at lot of its discernible “talent” from player to player. At the extremes, of course, you will have some shooting talented/untalented players, and most of those players are also shot-metric talented/untalented as well. But because of the regression, most players’ on-ice goal-metrics will fall back to average — while shot-metrics will maintain the nuance among all those players that fell to the middle by the on-ice goal-metrics.
No, the sample is the shots. The success/failure is the goal/no goal. And no, it doesn’t regress to league average to a very large degree over large samples. The standard deviation for forwards who’ve played at least 4000 minutes from 2010 to 2015 is .9205. Having 68% of your sample being 1% below or above the league’s shooting average is significant. That’s the difference of ~5 goals per year above/below average
You’re missing my point. You need the “success” to create the metrics you’ve cited above. And the success becomes regresses heavily.
You can say 68% of your sample being 1% below or above the league’s shooting average is significant, but you have to identify components parts and determine whether you’re simply looking at a correlation or understanding where the numbers are coming from. Take your figure, 5 goals per year. How many of those goals are due to shooting talent? How many to pre-shot movement? How many due to quality of teammates? How many due to “systems”? When you’re swapping around models that quibble over a couple goals per year difference between 68% of the playing population, and assign different values for the variables I describe above, you really start to lose the forest for the trees.
It’s not quibbling over a few goals. Normal distribution around 1% would mean that you’d expect 34% of the population to be 10 goals higher than another 34% of the population per year. Over a three year sample, that’s the difference between having Marian Hossa’s GF% and Andrew Shaws GF%. This isn’t insignificant.
“How many of those goals are due to shooting talent? How many to pre-shot movement? How many due to quality of teammates? How many due to “systems”?”
This is exactly my point. I don’t know. Because for the most part, we don’t have enough data. And I haven’t seen anything that explains how much of it is explained in DTM’s Exp. Goals (as an aside, it would be really cool to have those score adjusted) which adjusts for location, side of the ice, type, on the rush, and rebound.
Well, to your first point, that’s simply not correct mathematically.
To the second point; DTM is capturing what there is to capture of shot quality quite well. But as you answer the questions I asked with statistical models, you move further and further into conjecture (on what’s being identified/tracked, and how it’s represented statistically). If you create “honest” statistical models, with appropriate levels of regression and error considered, you’ll realize exactly what I’ve stated above: subsequent models are going to have diminishing returns.
Realize that every little category that you dice your data into has to be explained, shown to be repeatable, and not introduce multicollinearity issues.
FWIW, we do have a lot of the data necessary to determine what’s predictable, to the extent that future success can be predicted. The problem is low-scoring, the salary cap, and increasing adoption of analytics are bringing teams closer together, which introduces more randomness into the results. That’s a problem for shot-oriented metrics, but it’s even worse for shot quality models, which need goal-scoring to help set benchmarks and assess/adjust player metrics.
I guess the example was pretty poor. And probably doesn’t make that much sense. But what I was saying is that if Hossa’s on ice shooting % was decreased by two standard deviations, his GF% goes from 60.4 to 56.9. Not the best example, but Patrick Kane is probably a better one. He is basically 1 standard deviation above average over three seasons. If that was reduced by that one standard deviation, his GF% goes from 58.1% to 52%. But this is beside the point.
Yes, DTM is capturing what there is to capture extremely well. But how much does what there is to capture explain it. That’s what I’ve been trying to get at the whole time. Obviously, as you chip away at the possible explanations of something, there will be diminishing returns. No one is disputing that. But that doesn’t mean that those returns are not automatically insignificant, or not worth doing.
I think DTM’s adjustments do well to explain well what is explainable in shot quality, and as I expressed I see the returns beyond it diminishing, in part because of what that leaves over that can be explainable, even with new data. If, to you, what isn’t already explained in shot quality by DTM’s model is significant, and can be explained by something you have derived, then by all means show your work. It will not be the first, nor the last time someone is convinced that, because some players have higher GF%, there must be a silver-bullet reason, applicable to the player population. The fact is there are many sliding variables that create those differences, some applicable to some players and some inapplicable to others, and there is little agreement except for the few variables (among many that have been introduced and tested) that I have mentioned. And they cover enough of what can be called “shot quality” to ensure the remainder of what might be repeatable is quite small.
It’s easy to talk about what is “worth doing” when you haven’t done the extreme amount of statistical modeling and testing necessary to determining a shot quality variable beyond what has already been tested. I’ve seen people spend years of their life on it, only to find very little, for the same reasons I keep stating over and over. That’s not worth it, to me; I’m not getting paid, and this is a hobby. But you really ought to do the work yourself, if you think it is worth it. Show significance, show value, and show your work.
I can’t show my work. His data isn’t available for on ice ExG isn’t available as far as I know. But R^2 on out/underperforming personal exSh% is .1427 (minimum 251 shots in the lower year group). http://i.imgur.com/mVAgz2U.png
Exp goals is a marginal improvement over shot attempts. And I’m sure that, as richer data becomes available, any new model will have a marginal improvement.
Baseball analytics didn’t stop and batting average, on base percentage, or OPS etc. That’s why we now have WRC+. Those other stats were good predictors, but each successive stat was marginally better. And there will be even more that are marginally better, because the data is becoming richer. And the smart people in that sport are embracing and studying the richer data.
I mean do the work on shot quality. Research and build a shot quality model, because as you say it is significant and worth doing. So do it, and show your work that affirms what you are saying. In fact, I’m sure DTM would be more than happy to talk and compare with ExG after you finish and share it with him.
You’re stuck on the idea that I said “stop,” and misattributing it to mean “stop everything.” Let me state clearly, again and as I have to with you over and over, that when it comes to shot quality, the returns are diminishing and richer data isn’t going to help for hard statistical reasons. The richer data will be far more valuable to reveal other things, where returns will still be palpable and actionable, like the “how” behind generating scoring opportunities.
I’m not even going to entertain the baseball analogy, because as I’ve said, I’m not saying stop everything, nor do I think the baseball metrics you are comparing to shot quality models are analogous.
Also, I don’t have the ability to do it. I don’t know how to scrape, etc etc. And I don’t think the data is there right now to improve it. Like I said, I believe that DTM’s is probably as good as it gets with what’s available. I’m not saying that I could do better, or that anyone could do better right now. But the way you answered the question above, it seemed like you were anti richer data. Which just seems silly to me
You read it wrong, then, or you stopped reading at some point partway through the response. Probably because you were struck by the “silliness.”
Seriously, though, if what you’ve spent all this time talking about is as important as you say it is, I’ll assume you know the existing data well enough and you ought to learn how to scrape and run the tests. It’s not about “doing better,” it’s about pursuing what you’re interested in and testing your hypotheses. You’ve been vehemently asserting your hypothesis, so the ball is in your court.
“Simple postgame talking point we had during a video session: outside the dots off the rush, a forward shoots about 2%. Cut inside the dots and you can expect to shoot closer to 8-10%. Much more efficient to take 1 shot instead of 4, so cut in if you can.”
This is pretty much exactly right. So if you can identify players who are more consistently able to shoot from ‘inside the dots’ (as compared to line mates or team mates) then is that not valuable information?
In my eGF there are lots of players who have a CF >50% but an eGF < 50%, which is largely due to the fact that they don't shoot from good locations. Since eGF more closely correlates with actual GF% it's hard to argue against shot quality IMO. Further, if we had more precise data about defender locations when a shot was taken we could better isolate the shot quality issue.
Please share your formula for eGF%. Thanks.
Glad I inspired a post on shot quality. Sad that it is mostly “it exists, but doesn’t really matter” because that is unequivocally wrong. I actually don’t know where to start so I’ll start with this.
“We already know that the impact of shot quality (context + skill) is miniscule in comparison to other factors”
This is absolutely not true. It has been shown that the impact of shot location is minuscule in comparison to other factors, not shot quality. Tom Awad showed this years ago and actually found it to be more important than out shooting the opponent (http://www.hockeyprospectus.com/puck/article.php?articleid=625). Not only does that research tell us that shot quality is at least as important as shot quantity as far as out scoring your opponents is concerned, it tells us that shot location has relatively little importance and that other factors affecting shot quality is more important. As current xG calculations are significantly shot quantity and shot location based it stands to reason that xG will largely under represent overall shot quality.
Have a look at the long-term on-ice shooting percentage of forwards (http://stats.hockeyanalysis.com/ratings.php?disp=1&db=200715&sit=5v5&pos=forwards&minutes=5000&teamid=0&type=goals&sort=ShPct&sortdir=DESC) and tell me if the players at the top of the list have any resemblance to the players at the bottom of the list. You also can’t tell me that the difference between 9% shooting and 7% shooting is insignificant. That’s almost 30% more goals scored on an equal number of shots.
Even the difference between 8.5% shooting and 7.5% shooting is not minuscule. It’s 13% more goals. That is not insignificant. You need a ShotFor% of 53% over your opponent to overcome that shooting percentage. It matters because of the 198 players with >5000 5v5minutes over the past 5 seasons, 123 of them (62%) had an on-ice shooting percentage outside of the 7.5-8.5% range. Only 25% of the forwards had an SF% outside of the 47-53% range.
The idea that shot quality doesn’t matter much or has a minuscule impact on how we evaluate players is just nonsense. Furthermore, shot quality models based primarily on shot location will always underestimate shot quality as Tom Awad (and others) have shown.
Wait…who inspired what now?
Most of my hockey discussions are inspired by hockey.
If sv% gradually decreases as you get further from the net, intuitively location is the culprit. Not quality (unless you consider taking an errant clapper from nowheresville as a quality issue…a coach would).
With all due respect, I’m not sure I follow the rest of your comment. Nobody here is arguing quality doesn’t matter. Just that in a large sample analysis the impact of quality-related variables that we can objectively pinpoint and track/scrape get, by and large, washed out. I proved this over a year ago when the Tango model came about. They might very incrementally bump things. But not so much so that we should chase the dragon. Because unfortunately we don’t have more or better data.
If we did, I would question its objectivity. So many variables go into creating a “quality” shot, from the shooter’s wrist action to the way the puck is cradled before the shot to the ability to aim/elevate, to the velocity, to the whip of the stick, to the position of screening defenders, to the pre-shot movement, to the passing, to the goalie tracking ability…you must know where I’m going here. Can I just leave it at that?
“Nobody here is arguing quality doesn’t matter.”
Let me quote at article:
“We already know that the impact of shot quality (context + skill) is miniscule”
“why does shot quality have so little relative impact on long-term results?”
Both of those statements are an attempt to suggest that accounting for shot quality is of little importance in the grand scheme of things. Shot quality is “minuscule” and has “little relative impact”. That, in my opinion, is patently false and misleading and under representing the importance of shot quality.
“Just that in a large sample analysis the impact of quality-related variables that we can objectively pinpoint and track/scrape get, by and large, washed out.”
This I can agree with. It is much like Matt Cane’s comment “you’re not going to squeeze anything more out of the current data.” You statement here is not saying that the impact of shot quality is “minuscule” but rather the components of shot quality that we can objectively measure is “minuscule”.
My problem is the jump from ” the impact of quality-related variables that we can objectively pinpoint and track/scrape get, by and large, washed out.” to “impact of shot quality is minuscule” and ‘shot quality has little relative impact on long-term results’.
You know what correlates fairly well with Sh%? Ice time. Whether you are looking in-sample correlations or forward or backward in time correlations ice time correlates really well with shooting percentage. It seems coaches can evaluate shot quality and dole out ice time at least partly based on it.
I would love to see how xSh% (xG/shots) correlates with actual shooting percentage and whether it is better or worse than the correlation between ice time and shooting percentage.The gap between what the coaches see and the shot quality that xG accounts for is the absolute minimum under weighting of shot quality in the xG model.
WOW just what I was searching for. Came here by searching for introductory cooking skills