On Saturday, November 4th, we hosted the first ever Hockey Graphs Analytics Data Sprint. The idea was teams had 6 hours to take raw data and do something interesting with it as a trial for the Vancouver Hockey Analytics Conference. Local teams met up at La Casita here in Vancouver, but we also had online participants as well.
Thanks to all of the people who helped put it together, and thank you to all those who participated, especially those who travelled from as far away as New York.
In this post we link to the finished results and you can see the winners. Their work is in a github repo which you can use for your own data analysis!
The tracking and analysis company HockeyData donated the dataset, a 40MB CSV file of play-by-play data consisting of anonymized players and teams within the AHL.
As with any data project, there was some effort in learning the dataset and adopting the data to a usable format for each specific project.
The Vancouver Track was won by Team 4 including Sarah B, Dani C, Lucas W, Matthew R and Kristen B from the SFU Sports Analytics Club. Their analysis was focused on efficiency of breaking out in the Neutral Zone.
Building on Eric Tulsky’s poster at the Sloan Sports Conference, our project investigates the power play efficiency of AHL teams by grading their breakout plays. Tulsky found that offensive stars differentiate themselves from others by gaining more zone entries and more specifically gaining the zone with possession of the puck.
We graded teams’ breakout plays from behind their own net to a zone entry on the power play. We did this by quantifying the time in which it took a team to gain the offensive zone on average and the rate at which teams successfully gained the offensive zone. Additionally, we calculated the percent of time the offensive zoned with possession and by dump and chases.
The most difficult part of this project was classifying breakout plays on the power play. We classified any continuous set of actions that a team had possession of the puck after starting behind their own net as a breakout. Any breakout that ended with a successful zone entry was classified as such. Any other action that ended a breakout, meant that the team did not gain the zone. We classified these breakouts as unsuccessful.
The other teams focused on different areas of the game, and have provided write-ups on their projects:
Analysis of zone entries and puck carrying
Team 1: Nathan L, Marco D, Hamish C, Ben E
“In the first part of our analysis during the data sprint, we tried to take a look at the time different players spent carrying the puck. We plotted time players spent with the puck against their total movement up and down the ice, with the size of the data point indicating the amount of times they touched the puck. Players in our data set were unnamed, but we were curious whether the data points showing a lot of touches, movement, and time spent with the puck represented players that are generally thought of to be highly skilled.
We also looked analyzed the zone-entry breakdown of the teams in our data set in an attempt to find out whether there was a relationship between how often a team enters the zone with control of the puck and how often they win. We were surprised to find that teams had more success by chipping the puck in more often, and wondered whether that might be a property exclusive to the AHL due to the skill level of the league.”
Play style/Passing networks
Team 2: Kyle Stich (@k_sticher), Steve W, Mike D, Dylan H
For the data sprint, we attempted to pursue two paths, with the analysis supposed to
serve management, scouting, and coaching. Due to the time constraints, we only were able to partially finish one of the paths. The first path we looked to pursue which remained completely unfinished was to create a visualization of each team’s passing networks. We thought this would provide valuable insight in game planning for opponents and also to identify strengths and weaknesses of your own team as well.
The second path was the one we were able to arrive at a partial solution, which if
provided with more time we would have liked to expand on. The big picture of this path was to attempt to identify different playing styles for individual teams and individual players. We were only able to partially come up with partial playing styles for the individual players. To achieve this goal, there was work done in Excel, SQL, and R. Hopefully, all the files and/or scripts have made it into the publicly available file share. For those trying to follow the R code sorry it got sloppy! or if you have questions about it, please feel free to contact Kyle Stich via one of the
The effect of hits on breaking up possession
Team 3: Brandon C (@thefruz), Sam C, Stephen G
Team 3 started our investigation with a simple question: What, if any, is the statistical impact of hitting your opponent on their ability to generate shots, goals and sustain possession time? In order to answer this question, we took the play-by-play game event data, and turned it into a chain of events per team possession. From there, we can determine things like the average time a team possess the puck before either losing it or generating a stoppage in play. We can also questions like how often a team generates shots and goals per hour of “possessed puck”. To add hits to the equation, we can segregate our sample into team possessions where they sustained a hit-against versus possessions where they were not hit. From there, we can compute interesting tidbits of information about how the two populations of team possessions differ.The interesting stats we computed were:
Mean Shots Per Possession-Hour Mean Goals Per Possession-Hour Mean Passes Per Possession-Hour Mean Length of Possession (sec) Untouched 252 4.5 868 4.2 Sustained Hits 107 1.7 450 6.5What stands out is that teams that sustain a hit are far less likely to take shots on goal and even score goals per hour of possession. One possible explanation is hitting is, in fact, an effective form of shot suppression and defensive play.Future areas of interest in data are to ask more contextual questions about the differing types of possession. For example: One possible reasons teams take less shots when they get hit is because they are getting hit in the defensive zone trying to break-out. Another interesting question is why teams possess the puck for longer after absorbing a hit, when the usual objective of a hit is to separate the puck from the puck-carrier.
Alex Novet – Winner!
Alex Novet started with data exploration and zone exit success. From that he created an Expected Zone Exit (xZE) model. He finally pivoted to evaluating teams and players on zone exit value.
The other teams looked at different work including:
- Analysis of Zone Exits (Team 1 – Matt C)
- Standings, Zone Exits and Dumping the puck (Team 2 – Russ I)
- Data visualization (Team 4 – William L, Erin W)
- Shot and Pass Location (Team 7 – RJ W)
- Difference in offensive events splits (Team 9 – Matthew B., Jason B)
- Distances of Passes (Team 12 – Rakish)
Overall this was a very successful event and we learned a lot. Thank you to all those who helped put the event together and to La Casita for donating their restaurant space for the day.
We hope to run a similar event at the Vancouver Hockey Analytics Conference and we look forward to seeing you there!
2 thoughts on “Hockey Analytics Data Sprint Wrap Up”
Where is the publicly available code that participants wrote for this data sprint?
My bad, I totally missed the first part of the post!