Wins Above Replacement: History, Philosophy, and Objectives (Part 1)

Wins Above Replacement (WAR) is a metric created and developed by the sabermetric community in baseball over the last 30 years – there’s even room to date it back as far as 1982 where a system that resembled the method first appeared in Bill James’ Abstract from that year (per Baseball Prospectus and Tom Tango). The four major public models/systems in baseball define WAR as such:

“Wins Above Replacement (WAR) is an attempt by the sabermetric baseball community to summarize a player’s total contributions to their team in one statistic.” FanGraphs
“Wins Above Replacement Player [WARP] is Prospectus’ attempt at capturing a players’ total value.” Baseball Prospectus
”The idea behind the WAR framework is that we want to know how much better a player is than a player that would typically be available to replace that player.” Baseball-Reference
“Wins Above Replacement (WAR) … aggregates the contributions of a player in each facet of the game: hitting, pitching, baserunning, and fielding.” openWAR

As each of these simple definitions more or less state, WAR is a system, model, or technique that attempts to assign a total value for every player, which represents how much that player contributed to his or her team in a single number. This single number is comprised of multiple components that are isolations of a given area of play within a given sport. In baseball, these components are different for batters and pitchers, but the summation of each component is WAR’s attempt to encapsulate the total value a player added to their team. The idea of WAR in hockey, while not new, is definitely still underdeveloped.

The concept of WAR, however, feels a bit like the “holy grail” in hockey. Many have tried their hand at creating a model like this or one that has similar goals – often these people have been some of the leading voices in hockey statistics. The “Single Number Dream” has been as elusive as really anything else in hockey statistics it seems – and for good reason. A WAR model (for any sport) poses several incredibly important questions about how we as analysts evaluate players. WAR is not really about the single number, ironically enough. It’s about the way we arrive at that number. This number, as many have said before, is an estimate at best; it is not definite, it has uncertainty and assumptions, an implied “range” surrounds each number for each player.

While this ambiguity is often overlooked, the final number isn’t exactly easy. But it’s the process and the questions and ultimately the philosophy that make the search for the single number such an important aspect of sports statistics. How should we evaluate players? How do we combine the many aspects of such complex games, on the same scale and adjusted in just the right way, that we can confidently compare even-strength offense to even-strength defense, or a stolen base to a double, or a rebound to a 3-pointer? In our opinion, the process of finding the answers to these questions is just as important as what the single number actually tells us. So what has hockey done with this? We’re going to revisit baseball shortly, but since this is hockey let’s cover the prior work our sport has to offer.

Prior Models in Hockey

There have been several WAR methods created and used in the past to evaluate both NHL skaters and goalies. These weren’t all called “WAR” necessarily, but in some form each attempted to evaluate the entire value of a skater (or sometimes a goalie). WAR-on-ice’s write-up has a more complete history so please reference the link below, but we wanted to highlight a few of the more well known examples. A note: except for Emmanuel Perry’s, none of these models are up to date or “live” as of publishing:

It appears the first model/system that attempted to evaluate hockey players in a way similar to WAR was Alan Ryder’s Player Contribution method from August, 2003.
Michael Schuckers and James Curro created a player evaluation model in 2012 (updated in 2013) called ThoR (Total Hockey Rating). While this system is not current, it appears the data is still available here.
The now prior team at war-on-ice.com (Andrew C. Thomas, Sam Ventura, and Alexandra Mandrycky) created their WAR model in the fall of 2014 and hosted it on their site. The entire series explaining the model is still available online here. It is fantastic and a great reference for any and all WAR-related hockey discussion.
Dawson Sprigings developed a WAR model that was released in the summer of 2016 and was in production for the entire ’16-17 season. The 5-part series was hosted on Hockey-Graphs but is no longer available.
Emmanuel Perry created his own version of WAR in the summer of 2017 and posted an introduction to the concept of WAR here. His in-depth explainer of the model can be found here. This model is available on corsica.hockey.
Gordon Arsenoff presented his WAR model at the 2018 RITSAC conference. His slides can be found here. It doesn’t appear that this model is currently available publicly.

While we recommend you take the time to familiarize yourself with the above prior work, we’re going to focus on War On Ice (WOI), Dawson Sprigings, and Emmanuel Perry’s respective models to illustrate how some of the prior methods worked and what their respective philosophies looked like.

War On Ice WAR

This was the first model in hockey that labeled itself “WAR” – that is, its goal was to measure player contribution in terms of wins. This is important. In a very detailed and thorough (and an open-source!) way, they produced something that actually resembled what a WAR model could be in hockey. Here are a few quotes from their series to briefly demonstrate their own philosophy:

This system should be forward-looking; that is, no new information intrinsic to the system should affect our estimates from the past. I want this to be based on a predictive idea so that past performance is indicative of the (immediate) future.
Every piece should be linearly decomposable into its constituent parts.
… everything should be validated based on its ability to predict future outcomes on a grander scale. We shall not judge based on eyeball fit but by overall measures of predictive scale.
From part 2: The relative value of an agent — a team, player, combination of players, or circumstance — is how they change the rate at which events occur, for and against. This is of course meaningless to our purpose without Point 2: The only events that matter are predictive or indicative of a goal being scored.
From part 5: Measuring WAR is as much about context as it is about performance. Since our goal is to value measures that are predictive of future performance, a team that plays against strong opposition should be compensated because any baseline team would do worse in expectation; a team that plays a series of games at home with sufficient rest should expect to do worse than their record suggests when they’re on the road.

WOI’s WAR model was set up with a fundamental philosophy – one extremely important to understanding what it measured: the model was intended to be as predictive as possible. This makes sense given the analysis, research, and literature in hockey statistics. We often attempt to remove the noise and randomness from the game by focusing on things we know are predictive or indicative of future success. Unfortunately, this model wasn’t live for very long (as Ventura was hired by the Pittsburgh Penguins and Thomas and Mandrycky were hired by the Minnesota Wild).

There were, however, some great articles that dealt with this model. Dom Luszczyszyn wrote an article in October, 2015 for the Hockey News with quotes from AC Thomas, Ryan Stimson, and Corey Sznajder that talked about the model while it was still available. Additionally, this article from Vice covers both WOI and Perry’s site, and while not really WAR-specific, it’s an interesting read and gives you an idea of the timeline(s) surrounding public data. Cam Lawrence’s fantastic “How to Build a Contender” series used WOI’s WAR model to cover how an organization should build a contending team. Original Six Analytics has a good overview of the model as well here.

Dawson Sprigings

As mentioned, Sprigings released his WAR model in the summer of 2016 and kept it current throughout the ’16-17 season. While the 5-part series is no longer available, we can say (from memory and many old CSV’s) that we have a pretty good idea of how it worked. It appears this model was similar to Jeremias Engelmann’s Real Plus-Minus (RPM) metric used to evaluate players in the NBA, which itself was based on the various Adjusted Plus-Minus metrics and variations (this was covered in our RAPM article and will be covered in part 2 [future link] as well). It would be both unwise and a disservice if we attempted to summarize Sprigings’ model without a public write-up, so we’ll avoid that.

This model was similar to WOI (which is similar to the various baseball models) in that it approached aspects of the game independently as “components” and combined these to arrive at a single number. However, while WOI’s model was constructed with prediction as the main focus, Sprigings took this a step further: he emphasized evaluating players based on true-talent or true-value. This is a common concept in sabermetrics – the question of what a player’s “true talent” actually is. As mentioned, the model was current for one season, and it generated an incredible amount of content and discussion during the ’16-17 season (some of the juiciest bits are no longer public unfortunately). While this was often overlooked, at any given time in a season, the model was (from what we can remember) constructed to evaluate a skater’s true talent level – the same goes for end of season totals as well.

Here are a few articles that dealt with Sprigings’ WAR model while it was still active:

Arvind Shrivats covered it here.
We (Josh and Luke) used this model to construct aging curves for NHL skaters [here].
We also explored the model using rate statistics [here].
Alex Novet discussed strong and weak link teams using this model [here].
For a bit of history into prior debates, here is an off-the-cuff article from Matt Cane regarding the debate surrounding Sprigings’ model from April, 2017.
Finally, Sean Tierney still has the data available via tableau if you’d like to dig through it.

Emmanuel Perry

Emmanuel Perry’s model is the only other live model that is currently available, found here. He’s provided both an introduction to the idea of WAR in hockey here and an in-depth explanation of the methodology here. This model is structured in a similar way to WOI’s model, but uses corsica’s xG model instead of relying on shots and danger zones. As with WOI’s model, please take a look at both of the above linked articles as he does a much better job explaining his model. Perry has publicly stated this model was not constructed to be inherently predictive or descriptive – it’s probably best to think of it as in between. This model is not available on a daily basis in-season due to the time and computational constraints, so all of the data available are generally historical in nature.

A Few Notes

We had a decision to make back in May, 2018: A.) Present our WAR model at RITSAC 2018 and focus on building and creating our website to house this WAR model (among other things – www.evolving-hockey.com) or B.) Write the entire series you’re reading now. Since we had never attended a sports analytics conference (among numerous other reasons), we went with option A. This presented, of course, a series of compromises we had to make. The first being that we knew this write-up wouldn’t be finished until well into the ’18-19 season. The second was the fact that this new WAR model would be fully public and we may be asked to explain it without a proper reference for its construction. Both of these came true. We want to present a couple pieces that referenced our WAR model for continuity’s sake.

First, here are the appropriate links to both our RITSAC 2018 presentation video and slides (big thanks to Ryan Stimson and Matt Hoffman): slides and presentation.
Arvind Shrivats wrote a great WAR explainer for The Athletic (paywall) here that dug into both our model and Perry’s model.
The Athletic hosted several WAR articles/discussions/debates that went into the current public models to respective extents (again, all articles are paywalled). The first generated quite the debate on twitter, the second featured Brian MacDonald, and the third was curated by fellow HG writer Ryan Stimson with Michael Schuckers.
John Fischer discussed both our model and Perry’s model back in August.
CJ Turtoro wrote about our model (also in regards to the Devils) in October, 2018.
We (actually just Josh) contributed to an article on the Athletic written by Shayna Goldman that was Rangers-focused but covered a lot of questions about the model and how it can be used/viewed.

There have been a few other articles that looked at both our model and Perry’s as well. With that, we must apologize for the time it took us to finish this series. As we mentioned, we made a choice, and that left us with an unattainable deadline to finish everything – you know, priorities and all that. With that said, let’s get back to it: Baseball.

Baseball WAR

This topic has had books written about it, most teams use something that resembles WAR to a certain degree, and there are many that know far more about this subject than both of us. But, we need to discuss how baseball’s WAR models work (or at least try) simply because they were a major influence on both how we think about WAR and how we constructed our model. Additionally, we feel it’s important that we clearly draw the connection between baseball and hockey and how a WAR model can exist in both sports. At the beginning of this piece, we laid out a brief summary of the various public WAR models in baseball (FanGraphs, Baseball Prospectus, Baseball-Reference, and openWAR). Of course these were 1-2 sentence quotes that are far from comprehensive, but each gives us a good idea about what they are all attempting to measure: the total value a player added to his/her team relative to a replacement level player in one number.

This, however, brings us to a bit of a crossroads with how we approach building a WAR model for the NHL. If you haven’t noticed, there’s been little discussion in baseball’s WAR explainers (read: none) regarding two crucial concepts that hockey statistics relies quite heavily on: repeatability and predictiveness. In the field of hockey statistics, the idea of a metric being repeatable or predictive is one that has become foundational. That is to say, metrics are often “validated” on their ability to do one or both. In our opinion, the main reason Corsi (shot attempts) caught on and became such a fundamental idea in hockey work was due to its ability to better predict team wins. Expected Goals used this concept as well (Sprigings xG explainer). It doesn’t take much when researching the methods used in modern hockey work to find mention of one or both of these concepts.

As we’ve attempted to show above, prior work with hockey WAR took this same mentality as a core aspect of how prior WAR models were built. And there’s a very good reason for this. We’re not in any way trying to criticize this approach or speculate this is wrong – it’s not. These concepts are crucial in how we value a lot of aspects of the game, how we weed out and deal with luck, how we place confidence in players and teams for evaluation, the list goes on and on… But how does repeatability and predictiveness fit into Wins Above Replacement in hockey? That, right there – that’s the question.

Baseball WAR models are descriptive in nature; as Baseball-Reference’s explainer puts it, “the idea behind the WAR framework is that we want to know how much better a player is than a player that would typically be available to replace that player.” -OR- as the FanGraphs overview puts it “WAR is not meant to be a perfectly precise indicator of a player’s contribution, but rather an estimate of their value to date.” While there’s a lot to unpack with both of these, the last part of FanGraphs’ quote is a key aspect of baseball’s WAR models: value to date. To put this a little more eloquently, let’s defer to Baseball Prospectus. In 2013, BP released their Reworking WARP series – it’s incredible (part 1, part 2, part 3, part 4, part 5). In their introduction to this series, Colin Wyers described several of their goals with their new WAR model. This was the third goal:

“We want to know what a player has done. To use the technical terms of statistics, we view a player’s performance in a given time period as a population, not a sample. If you redid that sample a thousand times, that player could have done a lot of things. If you look at other samples, it’s very likely that this player has done different things. It doesn’t matter. We aren’t interested in what a player could have done, but what he actually did.”

Almost every one of the 5 goals Wyers lays out in this series (part 1) is in line with how both of us feel about WAR for hockey. What we’re trying to demonstrate here is that there has, so far, been a disconnect between what WAR in baseball is and what WAR in hockey should be. While not every WAR model or similar single number method developed so far in hockey has strayed from these ideas or concepts that the major public baseball WAR models hold, the vast majority of them have. That’s to say, they’ve jumped to something that we would consider to be an Expected Wins Above Replacement model or possibly something that resembles the newer “Deserved” metrics from BP’s new WARP (DRA, DRC+). OR even further incorporating the new Statcast data into an xWAR type model. Dave Cameron discussed this in a post on Fangraphs two summers ago when the new Statcast data started arriving. It’s a fascinating read as it deals with very similar questions to what we’re addressing here (even without the rumored future player tracking data!). Cameron’s conclusion feels quite relevant:

But while Statcast holds a lot of promise for improving the pitching and defensive sides of the components, getting ever-more granular hitting data might force us to again ask what we want WAR to be, and what the goal of the model is. There is no obvious right answer here, and that’s one of the reasons there will always be multiple ways of calculating WAR.

“There is no obvious right answer here” is an important point of emphasis: multiple WAR models can exist in any sport, each with different frameworks based on different philosophies. The prior WAR models in hockey have focused on prediction and true-talent evaluation. We, however, wanted to dial that back a little bit and create a model more in line with those defined in baseball. We’ve attempted to make a descriptive WAR model, through and through. This doesn’t mean it is not predictive, it just means that we don’t care whether it’s predictive.

Philosophy and Goals

The public WAR models in baseball are inherently descriptive – they measure what a player did; how a player added value or contributed to their team in a given span of time in a way that directly ties back to what wins games (runs). Broadly, WAR doesn’t care about repeatability or whether it is by itself predictive (there are of course exceptions here – for instance FanGraphs’ version uses Fielding Independent Pitching (FIP) for pitching WAR instead of ERA/RA9 the way Baseball-Reference does as it better accounts for the pitcher’s inability to influence the defense behind him). This idea is one that goes against many conventions in the hockey statistics community, but at its core, WAR is a descriptive metric. Given this fact, we were faced with a choice with the construction of our model: what do we do with this? The original WAR model created by WOI approached it this way to keep it simple. This was quoted above but we feel it’s important we emphasize this:

This system should be forward-looking; that is, no new information intrinsic to the system should affect our estimates from the past. I want this to be based on a predictive idea so that past performance is indicative of the (immediate) future; my only exception to this would be if we learned of bias in the data which needed to be corrected after the fact.

To be clear, hockey is a very random sport – luck is a major factor that plays a large part in a player’s contribution or value. Often a player can sustain performance above their “true-talent” level for long stretches of a season (sometimes a full season). This is definitely problematic. But baseball is often not that different – luck plays a major role in a player’s performance as well. Prior WAR and single number metrics in the NHL were (and are), for the most part, concerned with their ability to predict future performance or evaluate true-talent. While these ideas are extremely important, we feel a WAR model that attempts to better capture a player’s actual value-to-date was needed to evaluate NHL players. Here are our goals with this model:

We want to create a model that will, to the best of our ability, evaluate how a player in the NHL contributed to that skater or goalie’s team. This model or system should cover as many aspects of the game as we are able to account for, adjust for all contexts and situations, and adjust appropriately for teammates and (to a lesser extent) competition. We want a number (or numbers) that best isolates and attributes the value a player added or contributed to their team.
We will try, whenever possible, to use Goals as the basis for our method(s). Like baseball, we want this model to directly tie to Wins, and goals are how teams win games.
To better help fans and analysts understand this model, we want a system that will allow us to update and maintain daily WAR numbers within a season. While small-sample WAR numbers are problematic, it is important that we as fans and analysts can follow the progress of players throughout a season and evaluate both how the model functions and what it says about player performance day-to-day.
This model should be able to evaluate rookies and first-time NHL players the same way it evaluates veterans. Given that our goal with this model is to best describe and assign value to a given player’s performance, there should be no difference in evaluating a player who has many years of NHL experience versus one who has no NHL experience.
We should be able to analyze the inner-aspects of the model to provide context for any given player’s number or component(s). In other words, this model should not be completely black-boxed or uninterpretable. While this may be difficult, we would like to have the ability to investigate why a player’s WAR (or specific component) looks the way it does.
And finally, while a bit petty, we’d like this model to align with and better track the skaters and goalies that should win the end-of-year awards. Ultimately, WAR should be a data-based analysis of a player’s total contribution. It only makes sense that the Hart Trophy, for instance, should be based on a model that evaluates the complete performance of a player in a given season.

Having stated all of these goals, it’s important that we be clear here. We’ve described the prior hockey WAR models as “Expected WAR” models to an extent. This is not a bad thing – it actually might be the better option given the amount of luck and variance that occurs in hockey. In our eyes, however, there are numerous benefits to building a model like the one we’ve attempted to create. The great thing about WAR is that it is a framework – there is no single correct version. Parallel models, especially descriptive vs. predictive versions, allows for even more insight into player evaluation. We might even make another version in the future that looks nothing like this!

In this part, we’ve covered some of the history found in prior hockey work, discussed some of the differences in baseball and hockey’s respective philosophies, and outlined our goals for our WAR model. Along the way we’ve linked to quite a few articles that are all relevant to what we will discuss in the following two parts. Please take some time to read over what is referenced here. We feel understanding the history, theory, and philosophy of WAR is quite important. In part 2, we’ll cover the entire process of how the model is built. In part 3, we’ll cover replacement level and win conversion, cover some additional concepts relating to decisions we’ve made along the way, and attempt to tie all of this together gracefully!

3 thoughts on “Wins Above Replacement: History, Philosophy, and Objectives (Part 1)”

Tom Awad’s Goals Versus Threshold (GVT) and Iain Fyffe’s Point Allocations cake out at roughly the same time as Ryder’s Player Contribution. Tom also came out with Delta years later and Justin Kubatko came out with Point Shares.

There’s a guide to creating catch all stats in Hockey Abstract (2013) with some of the early history and there’s a comprehensive glossary of all hockey stats “A Fan’s Guide to Hockey Analytics” – all with footnotes.


Rob V
January 16, 2019 at 10:20 am
There’s a bunch on repeatability/predictive in baseball WAR. Tango discussed it quite often.


Kurt
November 18, 2020 at 6:32 am
I love your articles and I’m a big believer in the xGF and xGA models in soccer and hockey. Without knowing too much about baseball, what would be the one metric in baseball that is most similar, i.e expected runs for, and expected runs against?

There seem to be so many advanced baseball stats and it can be hard to wrap one’s head around all the technical language


Brook
February 23, 2021 at 3:11 pm

Hockey Graphs

Visualizing and analyzing hockey and statistics

Wins Above Replacement: History, Philosophy, and Objectives (Part 1)

This system should be forward-looking; that is, no new information intrinsic to the system should affect our estimates from the past. I want this to be based on a predictive idea so that past performance is indicative of the (immediate) future.

Every piece should be linearly decomposable into its constituent parts.

… everything should be validated based on its ability to predict future outcomes on a grander scale. We shall not judge based on eyeball fit but by overall measures of predictive scale.

3 thoughts on “Wins Above Replacement: History, Philosophy, and Objectives (Part 1)”

Leave a comment Cancel reply

This system should be forward-looking; that is, no new information intrinsic to the system should affect our estimates from the past. I want this to be based on a predictive idea so that past performance is indicative of the (immediate) future.

Every piece should be linearly decomposable into its constituent parts.

… everything should be validated based on its ability to predict future outcomes on a grander scale. We shall not judge based on eyeball fit but by overall measures of predictive scale.

Share this:

3 thoughts on “Wins Above Replacement: History, Philosophy, and Objectives (Part 1)”

Leave a comment Cancel reply