Identifying Playing Styles with Clustering

One of the aspects of player performance that is discussed ad nauseam is chemistry. How well do certain players elevate their performance with one player or another due to some inherent ability to find the other on the ice? To know what a teammate is going to do? However, very little has been done to analyze this phenomenon. In this piece, I posit that by identifying playing styles, something that’s been done in the NBA, we can quantify how well certain players will complement one another.

All data is from 5v5 situations from the 2015 – 2016 and current season, totaling almost 900 games from the Passing Project volunteers and Corey Sznajder. Special thanks to Asmae for her guidance throughout this piece.

I want to stress that this is a first foray into this type of analysis and simply because a player has a different style than what I’ve named (which are relatively arbitrary) it doesn’t mean they are necessarily better than another player. Players may have similar styles, but some will simply be more effective due to their ability. Finally, given that each day we accumulate more data, a player with a smaller sample size could find themselves in a different cluster in future analysis.

Previous work by DTMAboutHeart has found that playing a team’s best players throughout a lineup is preferable to loading up a top line. So, if teams can play their best players apart from each other and the team benefits, two things are plausible: 1) The playing style of a team’s best players elevates his less-elite linemates; and 2) Teams have players that are not elite themselves, but their playing style allows them to mesh with elite players better than others.

The former theory is further supported by Alex Novet’s recent work on hockey being a strong link game – having the best players throughout your lineup increases the chances of winning. The latter theory is what we will explore in this piece, but players like Teddy Purcell are who I think of for this. When thinking about roster construction, you want to know which players will complement each other the best in order to optimize the team’s performance, or as Kurt Russell said as Herb Brooks:

Methodology

In order to identify playing styles, we need to find a way to separate them based on a series of metrics. From the Passing Project data, I devised seven offensive metrics that explain how often and the many ways in which players contribute to offense. In order to not unfairly punish or reward players on teams that play slower or faster, indexes were created that accounted for how often a player contributed to offense. For example, if two players both set up shots at similar rates, but one is more integral to their team’s shot production, that influence will show up in the index. See below.

10 shot assists/60 minutes * 25% of team’s on-ice shots the player sets up = Index of 2.5

12 shot assists/60 minutes * 15% of team’s on-ice shots the player sets up = Index of 1.8

So, while the rate at which a player creates offense matters, it is important to account for how much influence they have on the team’s production.

The metrics are below. Each metric’s per-sixty minute rate is multiplied by the percentage of the total on-ice events that player creates:

  • Shot Index (individual shots)
  • Shot Assist Index (primary shot assists )
  • Build Up Index (secondary and tertiary shot assists)
  • Transition Index (controlled entry assists)
  • Danger Index (Shot Contributions from below the end line or across the slot)
  • Influence Index (Total shot contributions)
  • Pass Index (Total shot assists)

Once I had these metrics, I performed cluster analysis using k-means to group players based on how similar their offensive styles were. K-means is a simple method to find similarities between players with several inputs. For a thorough explanation of k-means clustering, have a read of Conor’s piece on the subject here.

The Results

The model found four clusters to be optimal and players were split by position prior to being clustered (I did run this with all players, but it clustered mostly along positional lines anyways). Let’s look at the forwards first and, again, these names are very simple so don’t take them too literally.

Fwd_Type

What you have here are the on-ice expected goal rates, which I demonstrated as being highly predictive of forward scoring, of each player type,as well as the average percentile ranking of each offensive metric used in the clustering model. The value of players that lie within the Playmaker cluster stand out. Also, players that are Shooters have strong numbers. But, identifying the best styles isn’t difficult. However, if we look at the number of forwards in these clusters, we see that teams still have work to do to fill out a roster as these two types only allow for, on average, four skaters on each team in a thirty team league.

The Balanced and Dependent clusters are where things get interesting and where we can use this method to identify which players should play with others. For all their shooting and influence, Shooter types don’t contribute to the most dangerous plays as much as Balanced types do, on average. Analytics is all about unearthing and exploiting market inefficiencies and understanding a player’s style and who he or she is likely to complement playing alongside should be analyzed prior to signing, drafting, or trading for that player.

Purcell

I mentioned Purcell above and his playing style is crazy balanced. He’s above average in each category (the rings are quartiles), so he’s someone that was comfortable playing various roles and complemented who he played with. Now, this isn’t to say that someone like Purcell is going to drive a line, but let’s recap what some research has shown us in the last six months or so.

Teams are incentivized to spread their best players throughout their lineup. This is due to a fact that a team can only have so much success with a stacked line (looking at you, Boston). Furthermore, with hockey being a strong link game, ensuring that the best players are on the ice as much as possible at different times gives you an advantage, or at least doesn’t put you at more of a disadvantage. The final piece is identifying which players can complement those elite forwards the best.

Too often we think about only whether or not this player is good or not, but not enough work is done on what environment could this player succeed in? Are some players better at certain styles of play? Can two players perform better together in a certain style than apart in a different one? So, I think this is a natural continuation of DTM’s and Alex’s work on optimal roster construction by filling in and identifying those middle-six type forwards to pair with the elite play-drivers. I do wonder if we’ll have teams using fifteen players during 5v5 situations and keeping three on the bench for PP and PK work or to rotate with the other players due to fatigue. But, that’s an idea for a post at another time.

These are only offensive playing styles, but if we look at the expected goal shares with each type on the ice, these follow a similar pattern.

Fwd_xG_Share

Even with only identifying offensive playing styles of these forwards, we still see that three of the four come out on top when looking at both expected goals for and against. The main reason why a Dependent player would suffer, is that if they depend on another to help create offense and cannot do it on their own, they will likely spend much more time in their own end, leading to a negative goal differential.

Now let’s move to the defensemen and look at the same things.

Def_Type

Given this sample of data, there are only eleven defensemen that fit the All Around cluster. These are mostly your perennial Norris candidates like Duncan Keith, Erik Karlsson, Drew Doughty, Victor Hedman, Kris Letang, etc., and some players you would not expect, but there isn’t as much of a sample on some of these surprises as the more established players, nevertheless their inclusion is somewhat intriguing. After all, if new analytical tools don’t surprise us at least a little, we aren’t pushing the envelope enough. The fact that we’re getting the best of the best in one group does validate the model for the most part. Remember, these are just styles of play, not necessarily how good a player is. I have a feeling a select few users on twitter will intentionally forget this key point to make some noise. I wish you good fortune in the Twitter Wars to come.

Chris Tanev and Niklas Hjalmarsson are often referred to as excellent defensive defensemen that don’t do much offensively. This season, Tanev has 0.31 points per sixty minutes, while Hjalmarsson has 0.66. These are not gaudy totals and, given their reputations, it’s unsurprising people classify them in this way.

However, by understanding how players, particularly defensemen, can contribute to offense, we can plainly see the hidden value in Tanev over Hjalmarsson.

Tanev_HJalmarsson

Tanev is in blue and Hjalmarsson is the purple underneath. Both players are near the bottom of the league in terms of their individual shot volume and do well in the build up phase of shot sequences, but Tanev wields much greater influence compared to Hjalmarsson. While both suppress shots at impressive rates relative to their teammates this season (Hjalmarsson at -4/60, Tanev at -6/60), Tanev is the preferred option for a more complete player.

There is some closer groupings for the defensemen given that they will have fewer contributions overall than forwards. How does each playing style perform when we look at their on-ice expected goal shares?

Def_xG_Share

The All Around types are much higher, but we see a close gap (0.9%) between those volume shooter types and puck-moving defensemen. But, should we play two shooters on the same pairing? Will a balanced forward work better with other balanced forwards? Or do they need at least one shooter on the line? We’re not done yet.

Optimizing Lineups

The logical question that arises when looking at this data is, “What types play best together?” Let’s have a look at the forward line combinations.

Fwd_Lines

These are when there are any combination of the four styles, so it’s not ordered by position, i.e. LW – C – RW. The way to read it across, for the second line, is just if one Balanced type played with two Playmaker types. A few things jump out: 1) Notice that three Balanced player types are a few notches higher than three Shooter types; 2) The ideal bottom six reveals itself with the line of three Balanced types directly above line with two Balanced types and one Shooter; 3) a Balanced player can play with two Shooters and still perform exceptionally well, freeing up the team’s Playmaker to anchor another line; and 4) Dependent players are next to useless.

What we have here is that by identifying these playing styles/player types, you can see the exponential effect of them blending their skills together. There is a synergistic effect occurring here.   You could be more likely to produce “chemistry” if two players either complement each other’s playing style, or if they are balanced enough to feed off of each other. In theory, a team could have two Playmakers, two Shooters, and fill in the rest of the roster with Balanced types and never go below an expected goal differential of 50%.

Now let’s look at defense pairings.

Def_Pairs

Unsurprisingly, the All Around and Volume Shooters dominate near the top, but the Puck-Mover type also acquit themselves well, never falling below 50% except for when paired with the Defensive-Oriented type. Many pairs are often both belonging to the Defensive-Oriented style, compounding their inability to contribute offensively by not having a partner to back them up properly.

Nearly 25% of all shots at 5v5 play had two defensemen of the Defensive-Oriented player type as a pair. This is yet another sign that teams struggle with identifying quality defensemen or overvalue things peripheral to the role.

Conclusions & Future Work

Identifying how a player contributes to offense provides options to put lines together in optimal ways. The same goes for defense pairings. To enhance this, individual performance would be taken into account in addition to playing style, creating tiers within the types. Defensive metrics would also be included to further separate and more correctly classify players.

For example, earlier this season, I could have given you two reasons why the Wild could contend in the playoffs and why they might not. They don’t have a Dependent player type in their top nine or a Defensive-Oriented defensemen in their top five; they also only have one Playmaker in their top nine and lack an All Around type on the back end.

However, adding Martin Hanzal added another Playmaker type to their roster. This creates possibilities.

According to LeftWingLock, Minnesota’s top three lines have looked something like this over the last ten games.

LWL

Based on the playing styles of these forwards, the first and third lines have expected goal differentials of 55% and the second line is at 50.9%. If you swap Staal for one of Pominville or Niederreiter, the third line only drops to 54.7% and the second line jumps to 52.2%. So, you end up with a net positive of 0.5% expected goal differential, assuming you’re dividing ice time equally between the three lines.

Now, again, I must state that within each playing style certain players will undoubtedly be more effective than others, so it’s not as simple as what I’m saying here; however, I do believe that this approach is the right one to take to further work on optimizing lineups and making smarter decisions on roster construction.

Finally, understanding player styles can enhance the decision-making of a coach regarding the on-ice assignments they give to their players. Knowing a player is far better at contributing at a certain area of the ice can help players play to their strengths and tactics can be tweaked to get the most of a coach’s players. A player that excels at transition play is probably the player you want swinging a bit deeper during a regroup, for example.

If you wish to access the Tableau with these radar charts and playing styles, that is available here.

8 thoughts on “Identifying Playing Styles with Clustering

  1. This is brilliant. I think this is the start of something ground breaking. It’s easy to apply, easy to visualize, and perhaps most importantly, the criteria can be directly linked to “eye test stuff”. Might help for all hockey people to understand and apply. Keep up the great work and thanks for the content.

  2. This is absolutely fantastic. When hockey analytics reaches in next milestone in acceptance and useful application by actual teams (whatever that looks like) I think people will point to this work. Great read, can’t wait for more.

  3. This is fantastic and a really interesting read! It’s a lot more intuitive than many other ways of trying to analyze successful lines, and goes further into the whys of success than “well, those two players have had good possession numbers together before”.

    I have to admit, I’m curious–you listed five of the eleven defensemen who met your “All-Around” criteria and said some of the other six were a surprise. I can understand not wanting to list dozens of players in any of the other categories, but would you be willing to share who those other six all-around defensemen are, or is it left to the reader to discover?

    Either way, thanks so much for your work and for sharing it!

  4. Interesting stuff! Reminds me a bit of personality categorizations…. I remember some Isles fans a couple years back who were upset when Grabovski was placed on a scoring line (largely because he couldn’t finish very well), but he helped move the puck in the right direction, and when he was placed with scoring talent (Tavares, Lee, Okposo, Nelson, Strome, as well as Kessel back in TOR) CF60 was sky-high…. Is using shots as a measurement rather than goals putting “shooters” at a disadvantage for GF% compared to “playmakers”, if they are particularly good goal-scorers? I know SSS for goals single-season, but pass-assists rate seems like it should mostly stabilize in goals-produced over a career (particularly with a variety of linemates), whereas shooting% can vary from player to player over a career. I find it odd that three playmakers would typically perform better in GF% than two playmakers and a shooter…. Looking forward to future updates!

  5. Love the article. I want to do some personal analysis for the lines on the rangers the same way you did the analysis for the wild. Do you have a list of which category all players are in or another way of doing that analysis? Thank you.

Leave a comment