clock menu more-arrow no yes

Filed under:

An All About the Jersey Hockey Stat Primer: Goals Above Replacement & Game Score

New, comments

In a final part for 2020 for the All About the Jersey hockey stat primer, this post summarizes two catch-all statistics: Game Score by Dom Luszczyszyn and Goals Above Replacement (Wins Above Replacement) by Josh and Luke Younggren of Evolving-Hockey.

St. Louis Blues v New Jersey Devils
If memory serves, Mirco Mueller was more valuable than P.K. Subban last season according to GAR / WAR. But I didn’t pay for access to Evolving Hockey so I could be misremembering that nonsense.
Photo by Andy Marlin/NHLI via Getty Images

Over the past two months, I have written up primers about various hockey stats that have grown in its usage over the past decade or so. They have been instrumental in helping the fans and team personnel better evaluate performances, players, and teams with more objective values than just observations and experience. However, to state whether some player or team is good or not requires an understanding of a bunch of different stats as well as the nuances and contexts that go with them. There have been attempts at developing catch-all stats that try to get to the point in a convenient value to answer questions such as: Does this player make their team better? Does this team have enough valuable players in all positions? This primer will focus on two that are still available - to a point - that is attempting to do just that: Goals Above Replacement (GAR) and Game Score (GS).

A Brief History Lesson

While WAR and GS have been around and used by many over the past four years or so, they are by no means the only attempts at making a stat like this. There have been multiple attempts to incorporate the logic of baseball’s Wins Above Replacement (WAR). Pro sports is all about winning and so that is the main value to focus on. In short, how many more “wins” is Player X worth over a run-of-the-mill generic player?

Dr. Andrew C. Thomas, most known for WAR on Ice, detailed the history of stats that tried to describe a player’s value in one simple number in the beginning of a series of posts that outlined his logic in creating his approach to WAR. Plus-minus, believe it or not, was an attempt to do this. It is flawed for many reasons: it only counts goals - which are not common events in games; it really only can apply in even strength situations; it tells us nothing other than Player X was on the ice for goals for or goals against - not whether Player X did anything good or bad on the ice. Other approaches tried to take more information into account such Alan Ryder’s Player Contribution, Goals Versus Threshold from Hockey Prospectus (Tom Awad, I believe ran the numbers), Point Shares at Hockey-Reference, and the WAR model developed by Dr. Thomas, Sam Ventura, and Alexandra Mandrycky.

Some of these gained traction. GVT definitely hit big partially because Hockey Prospectus was a key site for hockey analytics and mostly because it was an easy number to see. However, as Dr. Thomas wrote about, the drawbacks of those stats led many to question what these catch-all stats actually represented. Trying to figure out why Player X, who was a forward, had a lower GVT than Player Y, who was a goalie, was a challenge. And while the approach tried to measure defense, it is questionable how applicable it is to actual defense in a hockey game. Measuring defense in general remains a major sticking point in stats, especially in hockey. Due to the challenges and other factors - keeping up with the numbers, sites not necessarily having them available, the developers getting hired by teams - these stats tended to fade into memory.

The whole point of this is that it is the background of what led to Game Score and Wins Above Replacement. Some of the lessons learned from those attempts were heeded as both were developed.

Game Score & Game Score Value Added

Back in 2016, Dom Luszczyszyn explained Game Score to the hockey stat world at Hockey-Graphs. If you have a subscription to The Athletic, then you may be familiar with the name. And you may have seen this stat in its use. The idea of Game Score (GS) was inspired by Bill James in baseball and John Hollinger in basketball to determine whether the player has a really good game or not. The approach takes the stats you would see in a boxscore, weigh them, and calculate a number based on the stats and weights. The higher the score (anything above 4 is great), the better. From Luszczyszyn’s write-up, here is the original formula for skaters:

Player Game Score = (0.75 * G) + (0.7 * A1) + (0.55 * A2) + (0.075 * SOG) + (0.05 * BLK) + (0.15 * PD) – (0.15 * PT) + (0.01 * FOW) – (0.01 * FOL) + (0.05 * CF) – (0.05 * CA) + (0.15 * GF) – (0.15* GA)

Note: PD is penalty differential and PT is penalty taken.

And for goaltenders:

Goalie Game Score = (-0.75 * GA) + (0.1 * SV)

You can see what is valued more highly than others. Production is highly valued, which makes sense as the player does impact the game directly by scoring or setting someone up. They do get credit for shots, blocks, faceoff wins, drawing more penalties than they take, being on the ice for a shooting attempt (Corsi), and being on the ice for a goal scored by their team. They get punished for taking penalties, losing faceoffs, and being on the ice for shot attempt against or a goal against. It is simple but that is the point. For goaltenders, it is even more straight forward: saves are good, giving up goals are bad. The final result of either provides a simple number that explains how good their game was. It can also be averaged over a whole season, which can tell us how much the player contributed.

Luszczyszyn found that this is fairly repeatable from season to season. He was also up front about its flaws. It is very tilted towards offense as most hockey stats count offensive contributions. There are no considerations in this formulation for the context of the player’s usage, the player’s team, the player’s opponent, and nothing about special teams at all. It is meant to be a simple way to calculate whether Player X did well that night. And if Player X has a lot of good nights, then we can say Player X had a good season. That is the intent.

Luszczyszyn deserves credit for improving the model over the past few years. Specifically, in June 2019, he wrote at The Athletic ($) that the Game Score formula needed improvement and explained what he did. Namely, he added expected goals to the formulation: both on-ice expected goals for and against for the player and individual expected goals replacing individual shots on net. The weights for goals for and goals against were also adjusted, especially for defensemen. So if you see GS or GSVA, then pay attention to the year as the current formula is more complex and re

Luszczyszyn did expand on the Game Score concept to come up with Game Score Value Added (GSVA). It uses the same Game Score concept but with a player’s stats over the past three seasons and it is translated into wins. It also includes adjustments for usage, the player’s teammates in 5-on-5 play, and the player’s opponents in 5-on-5 play. This not only addresses some of the self-identified flaws in 2016, but it also effectively does what WAR does in baseball. It defines a player’s value into one convenient number. A player at zero or below is considered replaceable. A forward above 3 and a defenseman above 2.5 are considered “elite.” It respects that the game of hockey is hard to break down into individual performances and so a player providing contributions to be worth 2 or 3 wins should be seen as an excellent player.

GSVA is generally used to project a player’s or a team’s season. If you read Luszczyszyn previewing a team at The Athletic ($), especially now that a 2021 season is coming, then you definitely have seen GSVA. (I’m sure the New Jersey Devils preview is coming in the next few weeks; he just started with the all-Canadian North Division.) Because the calculation is pretty simple, you may have seen other people utilize it. GS was a part of Emmanuel Perry’s Corsica site. However, Corsica seemingly does not work anymore. The main way to see GS and GSVA is whatever Luszczyszyn writes/posts online. Outside of doing it yourself, which is not that difficult to do given that the formula is just a long equation.

This is unfortunate for tworeasons. One: As public access to a stat is a major factor as to whether a stat gains traction in the larger community. If Luszczyszyn leaves the online scene for one reason or another, then GS may end up where GVT is now - a memory of the past. Two: It is a very good attempt for what it tries to do. It tries to combine a bunch of stats that a player has in a game into a simple number to determine whether they did well. It does that. While it does not include every possible stat, it is easy to see why each stat is included. Production matters a lot in a game, which is included. Winning draws, being on the ice for your team taking attempts, being on the ice and creating shots with high expected goal values, drawing more calls than taking them, and even blocks can all contribute positively to a team. It is still offensively-minded, but that is more due to the nature of hockey stats itself. It is simple to present, relatively simple to calculate, and simple to report on.

The Younggren’s Goals Above Replacement / Wins Above Replacement

While the concept of incorporating WAR into hockey is not new, the attempt made by John and Luke Younggren has stuck around. In 2018, they presented their model at the Rochester Institute of Technology Hockey Sports Analytics Conference, opened their site Evolving-Hockey, and the model caught fire. It helped that it came out while hockey was in its offseason, there was a competing WAR model by Emmanuel Perry of Corsica, and there was a massive amount of online beef when James Mirtle of The Athletic ($) talked Matt Cane and Tyler Dellow about it, who did not think very much of WAR. By the way, Cane and Dellow work for the New Jersey Devils. Even I wrote all about it back in 2018 when the Twitter arguments were still steaming in the offseason light.

Over the past two year’s the Younggren’s approach to WAR has become the main one used whenever GAR or WAR is used in hockey. Corsica not really functioning helped with that. As did the Younggren’s Evolving-Hockey site working rather well prior to its paywall being put up. They also received a bit boost when they explained in detail how they came up with their model at Hockey-Graphs in a three part series of posts in early 2019. The first part goes into the thinking behind it, which does reference the article I wrote and a piece CJ wrote with respect to Sami Vatanen in October 2018 where the twins were interviewed. The second part goes more into what is in the actual regression model, which incorporates a model primarily developed in basketball as a starting point: Regularized Adjusted Plus-Minus (RAPM and, no, it does not actually include plus-minus). The third part goes into defining replacement level, converting the results of the model into wins, and final thoughts. While I and others have misgivings about GAR/WAR, these posts did a lot to garner a lot of respect for the Younggrens and their model. So did the fact that their site was easy to use and much more reliable than Corsica.

Unlike GS or GSVA, the mathematics involved for GAR/WAR is much more complex. It utilizes multiple types of regression modeling to obtain the weights for the stats involved. Here is a high-level summary of what goes into GAR/WAR and what it means:

  • First, RAPM is run for the players. The general idea of RAPM is to give us a target value (like goals, Corsi, expected goals) for a player adjusted for their teammates, opposition, score state, and more. Basically, it is meant to give a value to a player on their own.
  • Second, a Statistical Plus-Minus models are then run for the other stats the player may have. Again, despite the name, plus-minus is not one of them - it is another model pulled from basketball. This model uses all stats a player may have on a box score including zone starts and some relative stats (which is not the same as other relative stats at other sites) along with all Real Time Super Stats (blocked shots, takeaways, giveaways, hits, missed shots). Four values are determined from this: Even-strength offense, even-strength defense, power play offense, and shorthanded defense. Multiple models are run at this stage for each of the different values.
  • Third, team adjustments are applied to those Statistical Plus-Minus models. This is influenced by another basketball model: Box Plus-Minus by Daniel Meyers.
  • Fourth, penalty goals are determined for the player. This determines how many goals a penalty taken is worth in general. This is used in place of penalty differential.
  • Fifth, now that the Goals Above Replacement values are determined for even strength (EV_GAR), power play (PP_GAR), shorthanded (SH_GAR), and penalties (Pens_GAR), it needs to be converted to wins for Wins Above Replacement. A baseball concept called a Pythagorean Expectation is utilized to calculate goals per win. The GAR values are divided by this to get to WAR.

If that seems like a lot, then that is because it is. It is the opposite of Game Score, this is an incredibly complex approach to come up with a rather simple number all to describe a player or a team as good or not. Or, simple numbers, as a player’s GAR or WAR will generally come with the breakdown by even strength, power play, shorthanded, and penalty values. This is clearly the result of a lot of well-thought out plans and consideration of a lot of details. So much so that I can understand being unhappy when some dismiss the use of regression models as simply throwing a bunch of stats into a “black box” and getting a result.

There have likely been improvements to this approach since it was first explained in 2019. It has been close to two years since those posts at Hockey-Graphs and even then, the Younggrens were writing about their third version of the model. Newer versions may have come out since then, but I cannot confirm that. A quick peek at the Evolving-Hockey site shows there is an “xGAR” section - which I presume is meant for an expected value of a player’s or team’s GAR. I do not know whether the actual GAR/WAR approach has changed or how it has been. Still, what they did come up with is nothing short of impressive in terms of its utilization of models in other sports and how in-depth the twins thought of the game.

Unfortunately, its access is an issue. The Younggrens decided to put the RAPM, GAR, and xGAR results behind paywalls on their Evolving-Hockey site. Their site is otherwise quick to use and reliable, much in the ways that Corsica was not. It even has some really novel features such as a contract projection page. Even Dom Luszczyszyn pulls the data he uses for Game Score from Evolving-Hockey. It’s grown to be a big site. But it is not free. All of the hockey stat sites past and current from Gabe Desjardens’ Behind the Net to Darryl Metcalf’s Extra Skater to WAR on Ice to Corsica to Brad T’s Natural Stat Trick were/are freely accessible. You did not need to pay a single cent to see the data they scraped from the NHL with or without calculation. The Younggrens require you to be a subscriber to their Patreon to see the fruits of their labor. It costs $5 per month. Your budget may vary on wanting to see some hockey stats based on multiple statistical models that is not without its issue. In the interest of disclosure: I do not pay $5 per month to see them.

Those Three Biases Strike Again

I am glad that kmac6 asked for GAR to be covered in this series in the comments of the previous post in this series. That post went into detail about three biases that are common in hockey stats: The Streetlight Effect, Success Bias, and Scorer Bias. Both GS / GSVA and GAR / WAR are especially subject to those three biases.

First, both approaches to a “catch-all” statistic are based solely on stats that are already recorded and available. That’s the Streetlight Effect in of itself. For measuring offense, it may be OK in that the events that the scorer recorded are typically actions made by the team with the puck.

For measuring defense - even strength or shorthanded - it is problematic because there are really no events that get reliably recorded in favor of the defense. Sure, blocks, takeaways, and not giving up goals or shooting attempts or higher expected goals values point to defensive work. But anyone who has played defense in a sport like hockey, basketball, football, or soccer can tell you that effective defensive play can prevent an action from even occurring. Such as deterring someone from shooting or denying a potential passing lane. Good positional play and coverage is recognizable but it is nearly impossible to consistently count and therefore measure. How do you even count something that does not happen? We do the best we can with what we have, but truly evaluating defense remains in the dark.

This can also apply to goaltending to a degree. Does a goaltender’s positioning deter a shooter from taking a shot? Or force the shooter to attempt a more difficult shot? From watching games or listening to players, the answers are yes. From the numbers that recorded, it is unknown to what degree that is a factor.

Second, both approaches require the player to have played a significant amount of time. A call-up is simply not going to have enough ice time and games played to rack up a meaningful game score or RAPM or GAR/WAR value. This is the success bias in effect. This is how some players can be tabbed as “below replacement level” by either approach, even though they are remaining in the lineup for one reason or another.

Third, scorer bias is definitely an issue with both. Scorers can undercount or overcount shots on net. They can be inaccurate in shot locations, which dramatically impacts expected goal values. Just as I wrote in the previous primer, the process may still work - but the results are impacted by this. It is more of a problem for GAR / WAR as the Younggrens use all of the Real Time Super Stats. Even if they are weighted with really small values, it is still a factor. And if they are weighted so small that it should not have impact, why include them at all?

Because of these biases and one other issue, I really do not use either regularly. (There are other reasons, more about GAR / WAR but this is a primer and not a “Why I think GAR / WAR is bad.”) As incomplete as they are, I at least know what I am getting out of Corsi and expected goals. Between the two, I am far more in favor of Game Score because I better understand the concepts behind it. I really do not know if the multiple models used in GAR / WAR are used appropriately and I do not know if the Younggrens did enough to any of those models to adjust their results to represent the game of hockey. Regression models are a dual-edged sword. They can highlight something as being more or less important than you thought. But they also have no concept of the thing they are measuring as it is just match so it is on the modeler(s) to make sure it does not output something ridiculous like stating bodychecks are the most valuable thing a hockey player can do. That they or their adherents struggle to explain away specific results - Mike provided some examples in the comments of the 2018 post - does not instill more confidence in their approach either. At least with Luszczyszyn’s approach, I can work out how a the GS or GSVA result for a player or a team came to be. But even with that, it is only giving me a summation of what’s happening. I would rather go to Natural Stat Trick and go deep into the 5-on-5 stats - something I would need to do anyway if I needed to figure out how a player’s GS came to be.

They can also be improved if they were far more accessible. This is my other major issue with both as to why I do not regularly use them. (And also why I do not have examples; I would have used P.K. Subban and Mirco Mueller by these measures among others. I know from memory that GAR / WAR listed Subban as below replacement but Mueller was not - but without the numbers to confirm, I cannot go with it. Alas.) I cannot really go after them on my own. I understand The Athletic’s whole approach is to get sports fans to pay for sports writing. I also understand that the Younggren’s want to make some money off their work. But hockey stats derived from freely scraping NHL data from NHL sites need to be accessible to maintain their usage and be more acceptable. Especially for people who have not really heard it before or have a lot of confidence in it. By making at least the general numbers more freely available, then more fans can get into them, peruse them, and effectively spread the word about its value. What’s the point of coming up with a catch-all stat that is meant to provide one simple number to say if some player or some team is good if only a select group can see them? Game score was at least at Corsica, but Corsica seems to be non-functional now so unless it is somewhere I am not aware of, I do not see it. And, again, I do not pay the Younggren’s $5 per month to use their site. But, that is their business, not mine. Your mileage may vary.

The good news is that models can be improved. As data gets better and these biases are addressed in their own ways, these approaches will benefit from it. Likewise, the models can be adjusted as needed to account for the biases. They can also be presented in a way that it is clear that they have their limitations as Luszczyszyn clearly stated at The Athletic when he updated Game Score’s formulation last year. As there are improvements for hockey stats, these and future approaches for one descriptive stat will also improve. I do think there will be future attempts at it. Being able to distill a player’s contributions into one easy to state number and have it be used to describe if the player is good and/or better than someone else is a massive accomplishment. It can be a great jump-off point for wanting to learn more about why the player is good just as it would be a sufficient way to state that someone is objectively good to someone who just wants to know that. For now, these are the two that are leading in the hockey stat scene.

Your Take

I hope I provided enough detail to give you all a high-level understanding of these two catch-all stats. They are meant to be descriptive and they are meant to distill a lot of information into one easy-to-distribute value. However, using these values intelligently requires that level of understanding which is a lot more complex than the other stats covered so far in this series. This is not to say they should never be used; just that there is a lot more going behind the scenes that explains why it is what it is. If you have any further questions or thoughts about either, then feel free to leave them in the comments.

I want to thank everyone who has read this and the other primers in this series. If you click on “Hockey Stat Primer” at the top of the post, then you can easily see all of the other posts in this series. You can bookmark it for later reference or share the link with others to see all of what I have written over the past two months from Corsi to Game Score and GAR / WAR. Thank you for reading.