This week, I wanted to illustrate several ways that passing data can be used to more accurately assess players, offer examples of advance scouting and opposition analysis, and identify how and where teams attack and defend. On Monday, I gave you the basics for passing data. Tuesday saw a deeper look at network and linkup data. Wednesday introduce lane Corsi concepts. Yesterday, I combined most of this to illustrate how it could be used to prepare for an opponent.
Today is much lighter. Today is about releasing our data to the community. Enjoy!
Just like last season, our data will be out there for people to work with as they please. I’ll be writing on a lot of new concepts, metrics, and, quite frankly, pioneering new analysis. I like to put our data out there so people can access it to play around with, attempt to reproduce my own work, and simply because I believe the community is smarter as a whole than any one individual. I never want our project to be confused as some black box and this is part of that.
A few words on tracking. We’re not perfect. The data won’t be perfect. The NHL data isn’t perfect. People still arguing over imperfect data need to get over it. Most of the people tracking games with me tracked some sample games and were provided sample games to watch over the summer. I did a lot of checking during the early part of the season as well. Some people tracked many games with me last season as well. So, there have been processes in place to align everyone with the same principles.
Over a large enough sample size, mistakes will have less and less impact on the dataset as a whole. I believe in data being available and transparent, especially when you’re making claims about one dataset’s worth over another. If we have different definitions of a particular event (scoring chances for example), we’ll likely have different outcomes. That doesn’t mean one set of data is more valuable than the other – all it means is that the events are defined differently. Everything after that is self-serving narrative.
I will say that if you are interested in volunteering to track games and expand our database, you can reach me at firstname.lastname@example.org or on twitter at @RK_Stimp. If you don’t have time to track, you can always donate to the project here. Data and explanations below!
First off, I’ve going to take this space to thank the people who have been tracking this season. The first two are Bill Jennings (@jenningsbill) and Jason Reynolds (@jaycrey) who have been tracking the Toronto Maple Leafs and the thousands of words I inundated you all with this week would not have been possible without their efforts. The names and teams they track of the rest of of the team are below. Anyone who has tracked at least a single game will be on here, unless I forgot someone. If I did, let me know and I’ll amend the list.
Be sure you thank them.
Brian Franken (@onepasthunter) and Kevin Winstanley (@KiloAlphaWhisky)- New Jersey Devils
Jesse Severe (@jessesevere) – Washington Capitals and some Carolina Hurricanes
Shane O’Donnell (@shane1342o) – Florida Panthers
Jeremy Davis (@jeremydavis89) – Vancouver Canucks
Krista Asadorian (@kasadorian) – Dallas Stars
Sara Garcia (@sara_lnr) and Benoit Roy (@Benroy_) – San Jose Sharks
John Pullega, Sean Wentzell (@SeanWentzell), and Emma Kaiser (@triona05) – Chicago Blackhawks
Jessica Fong – Pittsburgh Penguins
Mike Little (@jmikelittle) and Shreyas – Ottawa Senators
Alan Wells – Tampa Bay Lightning
Derek Fetters (@DSF456) – New York Rangers
Jacob Reid (@jakereid) – Edmonton Oilers
So, how to read the gamesheets? There are definitions on the second tab of the workbook, so that will certainly help. Most of the columns will be self-explanatory (Period, Time, Home, Away, etc.), but there are a few particulars to point out.
The strength column is the strength of the shooting team, which is found in column D. Columns G – I are the primary, secondary, and tertiary passes in the sequence. Columns J – L are where our codes are entered for the type of pass and where it originated. You’ll want to refer to the definitions for these.
Columns Q – S are for rebounds from specific passing sequences. Here, these record shot attempts from rebounds in the home plate area. So, if if a goalie kicked a puck to the corner, someone picked it up and fired it back on net, we would not record that. Just home plate rebounds.
We’re capturing this to, hopefully, determine if specific movement is more likely to create a rebound opportunity (i.e. are certain goalies better at tracking right to left and controlling the initial shot, things like that). The number in the RB column is the shooter who attempts a rebound shot.
The score state (Column T) is with respect to the home team (Column X). Game ID and Dates are for those who wish to sync up our data with other sources. The goalie for each shot is there as well if there are those of you inclined towards goalie analysis and want to know what types of shot sequences are happening in front of him.
There is a Master Raw file that contains all of our data and the instructions for how to read the sheets. There are 30 csv files that contain each specific team’s games as well. Again, this is just the raw data for people to use as they wish. Data in more organized formats (totals, rates, percentages, etc.) will, of course, be released in the future. That requires a bit more work to put together, unfortunately.
If there are any other questions, don’t hesitate to reach out on Twitter at @RK_Stimp.