Delta Box Score: a model for predicting player scoring independent of teammate quality

 

Introduction

One of the greatest challenges in sports analytics is determining the skill of a player independently of quality of teammates. While a number of tools already exist (e.g. WOWYs in hockey), their (mis)use lends itself to significant limitations and collinearity concerns. This is where regression-based approaches can provide a more rigorous alternative in isolating a player’s true talent.  

An encouraging development in hockey analytics as of late has been Ryan Stimson’s Passing Project, which you can read about here. The goal of this post is to introduce a regression-based method to estimate an NHL player’s expected scoring performance independently of the passing strength of his teammates. To this end, player and linemate data from Stimson’s Passing Project and Muneeb Alam of the 2014-2015 season were used to devise a rate-based metric of a player’s projected goals. The difference between a player’s projected goals per 60 minutes and actual goals per 60 minutes will be called Delta Box Score or DBS.

Methodology

All data is 5v5

In a first step, each player’s passing statistics were weighed by how much ice time they shared with linemates. Stepwise regressions were then performed for forwards and defensemen separately to determine which significant variables to account for. Listed below are the variables that were used:

  • For Forwards:
    • O-Zone Goal Generation per 60
    • Scoring Chance Goal Generation per 60
    • Shots Generated per 60
    • Goals Generated per 60
    • Corsi Contributions per 60
      • Player’s total offensive contributions via individual shot attempts and both primary and secondary passes that lead to shot attempts 
    • Goal Involvement per 60 Same as Corsi Contributions
    • SCC per 60
      • Scoring Chance Contributions per 60 (iSC from War on Ice plus SC SAG)
    • iCF/Team CF%
      • Percentage of Corsi events a player contributes to by way of primary passes leading to shot attempts
    • Expected Assists per 60
      • Series of weightings on passes that is based on the likelihood of that sequence resulting in a goal
  • For defensemen:
    • O-Zone.Shot Attempt Generated per 60
      • Pass made in the offensive zone, but outside of Scoring Chance area
    • Scoring Chance.Shot Attempt Generated per 60
      • Pass sent into Scoring Chance (Home Plate) area and leading to a shot attempt
    • O-Zone Shot Generate per 60
    • SC.SG.60
      • Pass sent into Scoring Chance (Home Plate) area and leading to a shot attempt
    • Goals Generated per 60
    • Composite.SAG.60
      • Total Passing Contributions: All attempts generated from passes (primary and secondary)
    • Composite.SG.60
      • All shots generated from passes (primary and secondary)
    • Goal Involvement per 60: Same as Corsi Contributions
    • Scoring Chance Contributions per 60: (iSC from War on Ice plus SC SAG)
    • Shot Attempt Generation Efficiency (SAGE)
      • Proportion of shot attempts player generates that result in a shot or a goal. Exists in various forms, but SAGE itself is primary passing efficiency
    • OZ.SAGE: SAGE but in the Offensive Zone only
    • A2.SAGE: SAGE but with A2
    • OZ.C.SAGE: Total Offensive Zone Efficiency (OZ and SC areas)
    • Exp.A.60
      • Series of weightings on passes that is based on the likelihood of that sequence resulting in a goal
    • Entry.Assists.60
      • When a player generates a shot attempt in transition (pass made prior to zone entry), they are assisting on a controlled entry

Using these variables, basic linear regressions were performed for forwards and defensemen separately to predict what an average player would produce assuming equal quality of teammates.

Results

The following graphs illustrate how favorably the forwards and defensemen’s predicted goals per 60 did against actual goals per 60 in-sample.Forward Results

 

Screen Shot 2016-01-28 at 11.15.27 AM

The importance of each teammate passing variable on individual scoring is shown below for forwards and defensemen:

DBS Forward Model Weights Screen Shot 2016-01-21 at 11.20.40 AM

Discussion

My hope is that this is the very first step in devising player box score statistics that are independent of teammate strength, passing or other. In many ways, this is both a theoretical post and a work in progress. One of the reasons is the limitations of the data itself. Ryan’s 2014-15 season data that I used is already sizeable but more data – hopefully made available when this season concludes – is needed to run a more sophisticated model. While the preliminary results were encouraging, more testing on Ryan’s data needs to be done as well. 

In the meantime, it is also my hope that this post will spark constructive discussion in the community around context-neutral box score statistics as a better alternative to existing tools. Please do not hesitate to comment below or reach out via Twitter @DTMAboutHeart if you have any recommendations on how to move this project forward.

Special thanks to Ryan Stimson, Muneeb Alam and WOI for the data used in this post, and Asmae for her guidance.

The results are posted in the link here and below: 

3 thoughts on “Delta Box Score: a model for predicting player scoring independent of teammate quality

  1. I’m interested to hear more about the variables used; they seem like they might have multicollinearity issues. They don’t necessarily have to be discrete, but some of will have considerable overlap, and I’m wondering if you have a way of avoiding too much double-counting.

  2. Secondary thought: I’m interested to see some out-of-sample testing on this. It’s a good start, but that out-of-sample stuff will really be where the rubber meets the road.

Leave a comment