Bayesian Space-Time Models for Expected Possession added Value – Part 2 of 2

Cowritten by Brendan Kumagai, Mikael Nahabedian, Thibaud Châtel, and Tyrel Stokes

This is part 2 of a two part series introducing our Bayesian space-time model for evaluating offensive sequences and player actions. In part 1, we outlined our methodology to build the model and explained the reasoning behind the key metric, Possession Added Value (PAV), derived from it. In this second part, we will illustrate how our model and the PAV metric can be used for team and individual player analysis. Read Part 1 here.

Introduction

Since the Big Data Cup in March, our team has continued to improve the model to better estimate Possession Added Value of offensive sequences. With some extra time on our hands we cleaned up a few coding bugs and made two changes to the underlying models themselves. First, we have explicitly separated failed and completed passes. Second, we drastically improved our models which predict the location of the next event.  With these changes we are able to more realistically simulate play sequences and more accurately value passing and as a result the findings presented below might differ a bit from our Big Data Cup paper. 

Key Findings

Rush vs OZ Play

The PAV metric helps us assess a player’s overall contribution within offensive sequences. It can help us break down a player’s offensive impact for each of the following events: entries, passes, shots, turnovers and recoveries. Despite the fact that our dataset is limited in terms of the number of observations for non-Erie players, it is interesting to see some general trends emerge.

When looking at initial results from our model, we notice that both time and space play a huge role in how much value players can expect to add within offensive sequences. The first few seconds following a zone entry are the most dangerous: during this time period, the slot area is expected to be very valuable. If a team manages to access this area of the ice on the rush, they should expect to be able to create a very dangerous offensive sequence. Following a zone entry, defensive structures start to take shape with time, making the slot area harder to access.

Figure 1: The relative value of shots as a function of time since entry and spatial location. The panel on the left is shots with pre-shot movement (passes) preceding the shot and the right is without Pre-Shot movement.

In the figures above, the area with high expected goals – characterized by brighter colours in the slot – decreases in size as we transition from rush play to offensive zone play. Thus, as time goes on, the defence will set up and offensive teams will have less time and space in high-value areas of the ice, which will lead to a decay of value in these areas.

Passing Plays

The dataset published by Stathletes for the Big Data Cup included passing plays (both failed and completed), which represents a significant portion of the puck plays made on the ice. This resolution of data, not usually available in the public sphere, allowed us to assess the relative value of passing plays as part of offensive sequences by incorporating passes into our PAV equation.

Figure 2: Relative frequency of Passes which follow passes as a function of space and time since entry

In general, traditional hockey fans assess the playmaking ability of players by focusing on a few game-changing passes. But these game-changing passes don’t occur often in a game. Especially, as the defense sets up and locks the middle of the ice, the advantage of time and space fades away for the attacking team. As a result, we can see on the figure above that the vast majority of passes happen along the boards, through low to high or high to low plays, with spacetime characteristics that will yield very low offensive potential.

Figure 3: Pass Completion Maps using the starting location (Left) and target location (right) split by pass type

To reinforce this idea, our model also highlights the significant difference in terms of success rates, between making a pass from the middle of the ice towards the boards and passing from the boards towards the middle of the ice, in the offensive zone. To put it simply, passing from high-value areas to low-value is easy; passing from low-value areas to high-value areas is difficult and does not successfully occur often.

In summary, our model suggests that perimeter passes, which make up the bulk of all passes, neither increase or decrease the value of a possession significantly since they do not infiltrate the high value regions and balance this with being a low risk of creating a failed pass or turnover. That said, passes which greatly improve the condition of the puck are some of the highest value plays possible and players consistently able to do so can have large PAV passing values. This is in contrast with our Data Cup presentation and the biggest change due the model tweaks mentioned in the introduction.

However, passing plays are one of the most difficult hockey events to properly assess given that many external factors would need to be taken into consideration to understand the decision-making process behind them. For instance, there might be some long-term benefits in prolonging puck possession by passing to less favourable areas of the OZ, depending on the circumstances, which is reflected in the (slightly) positive average value of successful passes, even those around the perimeter.

Furthermore, without player movement data, our model can only capture proxies for the opposing team’s defensive structure, which is key to understanding the decision-making process of players when passing the puck. Therefore, our model cannot fully differentiate a case where a player willingly passes the puck to a lower value part of the ice, yielding a certain tactical advantage, from a situation where a pass truly hinders the ability of the team to generate offence given the poor decision of a player. To go further, one can only dream of the day where data would be available on players’ passing and receiving ability to assess the probability of a successful pass, along with distance, backhand vs forehand, the presence of defensemen in the way, their pass-breaking ability, the position of their stick, etc. 

Other Events

The other events’ involvement in the PAV equation is much more simple to capture. For the most part, players draw positive PAVs from zone entries. This is in line with the foundations of our model as a zone entry enables a team to move into the offensive zone, providing them, in most cases, with a more favourable position to generate offence. Even if a dump-in is historically half as productive as a controlled entry1, it still moves the needle upwards, going from outside of the offensive zone to the possibility of a shot. Similarly, players generally draw positive PAVs from puck recoveries, as the recovery of the puck is tied to the possibility of generating offence. On the other hand, all players get a negative PAV from turnovers. Finally, about 90% of players receive a positive PAV from shots; these rare cases can be interpreted as a player taking an abundance of lower-danger shots when there are generally more beneficial options available.

How to use PAV

As a metric assessing the full involvement of a player inside the game, in connection with the causes and consequences of his decisions, PAV should be seen as an overview go-to stat, first and foremost. However, the components of PAV open a full realm of possibilities for in-depth analysis into the whys and hows. 

The performance of a player on PAV and each of its components can be measured compared to team average, league average, adjusted for position, deployment, age or league if we think in terms of scouting. Once such a metric is available across different leagues, it would be easy to weigh it so the performance of a 17 year old player in the OHL can be compared to the impact of  an 18 year old prospect in the Finnish Liiga, for instance.

Figure 4: PAV per game played for the Erie Otters Players in the 2020-2021 season split by event type

The figure above shows how each of the Erie Otters’ players contributed to their team’s offensive sequences. The best players (i.e., Golod, Yetman, Singer, Hoffman) created significant value from zone entries, shot and puck recoveries when compared to the rest of the team.

But beyond the raw numbers, it was also very intuitive to build heatmaps showing where players created value on the ice looking at all events and for each of the PAV components. This sort of visual is critical to ensure a proper access and use of a new metric like PAV. In addition, this sort of heatmap provides an indication for where a player is adding value rather than only examining areas of the ice in which a player is active. There is a significant difference between being active and actually creating value. This is where PAV can come in and add a new layer to the analysis.

Figure 5: PAV maps for a selection of the Erie Otters Defensemen. Yellow indicates areas where the player generated more added value than expected and darker blue indicates less value generated than expected.

Case Study: Connor Lockhart

To put it simply, PAV answers a very simple question regarding individual player performance: on average, by how much does a player improve or worsen the condition of the puck on the ice for his team with each puck touch?

In terms of scouting, PAV can be used as a starting point to help identify strengths and weaknesses in a prospect’s game. Combining it, afterwards, with other metrics as well as video analysis will enhance the scouting process. The following paragraphs exemplify how our metric can be utilized to analyze a prospect’s game.

Connor Lockhart is a prospect eligible for the upcoming NHL Draft. Looking at his player card, we notice that this undersized right winger adds value to this team’s offensive possessions in different ways.

Figure 6: Connor Lockhart Player Card

In terms of entries, Connor Lockhart ranks around the 38th percentile in terms of PAV among OHL forwards. With a 50% carry rate on zone entries, Lockhart only allows his team to generate sustained offensive pressure, every second time, by ensuring full control of the puck on zone entries. As a right-winger, he tends to enter the zone from the right side rather than the middle of the ice. Therefore, in order to improve his PAV in this aspect of the game, Lockhart could work on using crossovers to gain speed and change lanes through the neutral zone. These lane changes will help him initiate more entries through the middle of the ice and generate a higher volume of chances off the rush.

In terms of recoveries, Lockhart’s strength in this area of the game are displayed by him ranking around the 53rd percentile among league forwards in terms of PAV. Continuing to focus on smartly recovering the puck on both sides of the OZ and playing inside contact along the boards will help Lockhart sustain his above average track record in this category.

From a shooting perspective, most of his attempts are in the slot (high danger) area making him an offensive threat for the opposition. His shooting PAV, which is around the 53rd percentile, helps highlight his ability to quickly release his shot in tight areas of the OZ, as a 16-year-old forward: in 2019-2020, Lockhart’s release was about 0.3 seconds faster than league average.

Lockhart is below league average both in terms of completed (46th percentile) and incomplete passes (40th percentile). When analyzing his passing clusters, we notice that most of his passes do not add much value from an offensive standpoint (low cycle passes, low to high passes…). In order to improve his passing PAV, Connor Lockhart should leverage and work on two key elements of his game. First of all, starting with a good puck reception is key: improving puck control with a better controlled first touch in a dynamic position (catching the puck in a weight shift or a crossover) would go a long way in allowing Lockhart to open up passing lanes in valuable areas of the ice. Once the valuable passing lanes are open, moving the puck quicker would limit his opponents’ reaction time and allow him to complete a higher rate of valuable passes. In 2019-2020, Lockhart was about 0.1 seconds slower than league average to pass the puck.

All in all, Lockhart ranked around the 50th percentile among OHL forwards in his rookie year, looking at average PAV per event, being the 7th best attacker on his team. He has shown some interesting signs of offensive upside, which could help convince an NHL team to take a chance on him in rounds 4-7 in the upcoming NHL draft.

Future Work

While we have presented preliminary rankings of OHL players, there are many avenues to extend our work. There are two main branches to explore moving forward with the model we have presented.

First, we can continue to build upon our model and methodology. This can include expanding our model to the full ice, quantifying defensive contributions, accounting for quality of teammates, and extending to higher resolution tracking data.

Second, this article is just scraping the surface of potential data analyses with the PAV metric we have developed. The robust nature of this metric would allow us to cluster players by play style, analyze the spatiotemporal changes in PAV over a season, and incorporate uncertainties into player and sequence evaluation.

Bibliography

1Châtel, Thibaud. “Introducing Offensive Sequences and the Hockey Decision Tree.” Hockey Graphs, 2020, https://hockey-graphs.com/2020/03/26/introducing-offensive-sequences-and-the-hockey-decision-tree/.

Data May Not Drive Play, But It Should Drive Decisions

It’s not easy creating a data-driven decision-making culture in any organization, let alone one as bound in tradition and lore as the NHL, where hockey men are imbued with mythical powers of observation and judgement just by virtue of having played the game. And yet, the NHL is clearly moving in that direction. It may be at a glacier-like pace, but I suppose that makes sense, what with the ice and all. Despite some early stumbles, it’s probably safe to say that it is only a matter of time before data-driven decisions are the norm rather than the exception. Whether that happens while we still have glaciers is another matter.

But even when there is a managerial will and top-down direction to move toward a data-driven culture, it is often difficult to introduce data analysis into the existing decision-making process of an organization. It’s not just deep structural changes that are necessary, but also staff will need a robust change management process. It’s hard enough to get people to accept change, but a new culture requires that they go beyond acceptance and embrace it as a new way of doing things. This is a difficult process in any organization. However, it is made more difficult in the hockey world where many in positions of authority are in those roles precisely because they “played the game” and understand the traditional way of doing things.

But what if you could start from scratch and build something from the ground up? 

Continue reading

Bayesian Space-Time Models for Expected Possession added Value – Part 1 of 2

Cowritten by Brendan Kumagai, Mikael Nahabedian, Thibaud Châtel, and Tyrel Stokes

Introduction

This is part one of a two part series introducing our bayesian space-time model for evaluating offensive sequences and player actions. In this part we describe the model and the key metrics derived from it. In part 2 we will show how the model can inform and integrate with player and team evaluations.

Hockey is a game of making the best possible decision in the shortest amount of time. Players need to react quickly to form a chain of plays to create valuable scoring chances. Our goal is to quantify the value of player actions as a function primarily of space and time in the offensive zone. This is to recognize that the puck location and the threat-level it poses to the opposition is a key driver of what options will be available to the puck carrier and – as a result – what is likely to happen next. Ultimately, we want to credit players that are able to make high quality decisions and difficult plays which advance the puck into more valuable locations on the ice.

To this end, we primarily build off of 3 previous papers. First, we use the conceptual framework of understanding play sequences in hockey (Châtel, 2020) from our team member Thibaud Châtel. Second, we adapt the multi-resolutional Expected Possession Value modelling framework pioneered by Cervone et al. (2016) in basketball to work with the detailed play-by-play data generously provided by Stathletes as part of the Big Data Cup hackathon. Finally, using this infrastructure we propose a metric called the Possession Added Value (PAV) based on Karun Singh’s Expected Threat Model in soccer (Singh, 2018) which has previously been adapted to hockey (Yu et al., 2020).

Due to the timeframe of the competition and the complexity of our proposed model, we decided to narrow our scope down to offensive even strength sequences that begin with an entry and end with either a shot or a whistle. By only considering offensive sequences the model as it stands can only properly evaluate the actions of the offensive team.

Before we dig into our methodology, here is an example of what our end product will look like in Figure 1 below. We assign each event in this entry-to-exit/whistle sequence with our Possession Added Value (PAV) metric, which can be thought of as the increase in probability that we score in the sequence by performing the observed action. For example, the pass by Landon McCallum from the top of the left circle into the slot adds 0.0677 goals to the expected value of the possession.

Figure 1: An offensive zone sequence with our Possession Added Value (PAV) metric

Continue reading

How important are faceoffs to possession in women’s hockey?


This was co-written by Mike Murphy, Alyssa Longmuir, and Shayna Goldman based on work for the Big Data Cup and Ottawa Hockey Analytics Conference
.

As a result of women’s hockey analytics needing to play “catch up,” it’s not unusual to see analysts relying on stats that have already been proven to be less insightful in the men’s game. One such area of the game that is frequently highlighted at the collegiate, professional, and international levels of the women’s game are faceoffs. 

Faceoffs have been covered extensively in men’s hockey, and much of that work points to the fact that faceoffs wins aren’t all that they’re chalked up to be. Back in 2015, Arik Parnass, now of the Colorado Avalanche, found, “This … aligns with what hockey analysis has found over the years when it comes to faceoffs. Overall, winning them just isn’t as important as it’s made out to be.” 

While a great deal of work has been done on the importance (or lack thereof) of faceoffs in the men’s game the same cannot be said of women’s hockey. But why would it be any different? 

Continue reading

Behind the Numbers: Pareto’s Principle, Power Law Distribution, and when tracking data does not matter

Every once-in-a-while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net. My ramblings will look at the theory and philosophy behind analytics and their applications given what is already publicly known, keeping my job safe while still getting to interact with the public hockey-sphere.

Hello. Hope everyone is enjoying my return after a long hiatus. I am back from my busy schedule of helping run a tracking company that sells private tracking data to argue here against overvaluing private tracking data (and in addition black-box models)… or really I’m suggesting to not underrate what’s in the public.

You heard that right. The guy that has vested interests in demonizing public models and data is going to defend public models and data!

Continue reading

Behind the Numbers: Theory on Environmental Impacts and Chemistry

We’re bringing it back! Every once in a while I will rant on the concepts and ideas behind what numbers suggest in a series called Behind the Numbers, as a tip of the hat to the website that brought me into hockey analytics: Behind the Net. My ramblings will look at the theory and philosophy behind analytics and their applications given what is already publicly known, keeping my job safe while still getting to interact with the public hockeysphere.

I’m back and here to ramble on things like models, sheltering, and environmental impacts on the results we measure.

Continue reading

Quantifying the influence of no-trade clauses, signing bonuses and LTIR on NHL cap tables

The recently agreed CBA extension and MOU (April 2020) includes provisions suggesting a flat salary cap for years to come, and as a result, general managers and players have experienced an unprecedented draft, free agency and arbitration marketplace this fall. NHL league activity is expected to continue under a particularly unique context caused by loss of hockey related revenue from the Covid-19 pandemic, and the upcoming Seattle expansion draft.

Under this challenging and uncertain financial landscape, I endeavored to conduct contract research to better identify league-wide contract negotiation trends and evaluate anticipated flexibility of NHL team’s salary cap structures by looking at:
– No-trade clauses
– Signing bonuses (S.B.)
– Injury reserve (IR) and long term injury reserve (LTIR)

Previous Contract Analysis Work

Having started my journey in analytics with the opportunity to grow as part of the inaugural hockey-graphs mentorship program, it is a privilege to take this opportunity to build on the inspiring contract negotiation and player valuation work of Matt Cane (The Time Value of Money and Player Valuation), Mike Zsolt (The Financial Frontier: Defining characteristics of competitive salary cap management), Josh and Luke Younggren (Projecting NHL Skater Contracts for the 2019 Offseason), and Shayna Goldman (ISOLHAC: How can we better our contract analysis), amongst other distinguished leaders in the analytics community.

Continue reading

How Canada and the US differ in their roster philosophies during Olympic cycles

While the 2022 Beijing Winter Olympics are still over a year away and the memories of Pyeongchang are still fresh in many fans’ minds (with only one World Championship taking place since then) centralisation for both Canada and the USA is rapidly approaching. Countries historically pick their rosters around late May, beginning of June in the year prior to the Olympics to allow time for players to train, bond and participate in exhibition games before the final roster selection occurring just a month before the big event. What goes on during those 9 months prior to skating out of that Olympic ice surface is largely kept a secret with roster decisions often being announced in a somewhat cut-throat manner and additional players often being drawn in from outside the bubble to the surprise of everyone. Throughout this article, we will be looking at the survival rates of skaters on National Teams over the past 30 years and investigating what this means for roster selection heading into Beijing.

In 2018 between the two teams there were only 3 first time players. Cayla Barnes and Sidney Morin both lined up for the USA on the big stage while Sarah Nurse did the same for Canada. That is of course not to say these players didn’t have prior international experience. Nurse made her national team debut at the 2015 4 Nations Cup and had also represented Canada at a U18 level. Cayla Barnes while just 18 at the time of centralisation had played for the United States 3 times at U18’s including Captaining them to a Gold medal that very year while Morin had previously represented the USA at the 2017 The Time Is Now Tour. While there were only 3 ‘true’ rookies between the two teams that was not to say this was the same line-up as the previous Olympic in Sochi with Team Canada having 8 players missing from their gold medal-winning Sochi side, and the USA missing 7.  I have put their names below as we will return to them later.

CANADAUSA
Caroline OuelletteAlex Carpenter
Catherine WardAnne Schleper
Gillian AppsJosephine Pucci
Hayley WickenheiserJulie Chu
Jayna HeffordKelli Stack
Jennifer WakefieldLyndsey Fry
Lauriane RougeauMichelle Picard
Tara Watchorn 
Skaters from the 2014 rosters not included in the 2018 rosters
Continue reading

Building a Shot-Plotting App in Shiny

For me at least, hand tracking is 99% of the time born out of necessity. 

The only way I am ever going to get location data for shots is if I break out a multicoloured pen and write down all the locations and numbers myself. Its isn’t however exactly the quickest process to deal with.

I actually really enjoy hand tracking is the thing, It keeps me focused on the game at hand and stops my mind from wandering. The issue comes when it’s time to digitise that information for analysis. I have written about this before over at The Ice Garden, back when I tracked an entire season of the Australian Womens Hockey League. That season it took me around an hour of straight work to plug in every piece of information so that tableau could process it and as my life got busier, the amount of free time I could dedicate got less and less. 

The idea to force a shiny app to do something it has no right to do came out of necessity. Partially because I wanted to be able to show heat maps to the Head Coach of the local team I work with during intermission, but mostly because my Masters project consists of getting school kids ages 11+ involved in sports analytics and I really wanted them to be able to produce their own heat maps and yet I really did not want to attempt to explain the complexities of Kernel Density Charts to a collection of 12-year-olds.

So here we are. 

The Hockey Plotter 1.1

Continue reading

Chatter Charts – Visualizing Real-Time Fan Reactions

Today, I’ll explain the methodology behind Chatter Charts and show you how I use statistics, R and Python to analyze hockey from a completely unexplored angle: your point of view.

I. Introducing Chatter Charts

Chatter Charts is a sports visualization that mixes statistics with social media data. And unlike most charts, it is specifically designed to thrive on social media; it is presented in video and filled with volatility, humour, and relatable moments.

It assumes a game is like a linear story—filled with peaks and troughs—except every story is written by fan comments on social media. It actually tries to recreate the emotional roller coaster fans tend to experience when watching sports.

Image for post

But most people don’t know about the math and code behind Chatter Charts. It isn’t just me picking words I think are funny or a simple word count—it uses a topic modeling technique called TF-IDF to statistically rank them.

I want to go through that with you today.

Continue reading