Happy Thanksgiving. I hope you all counted your blessings. After last week’s primer about percentages and PDO, this week’s primer is going to back to shots - but with an important twist. The first primer was about Corsi, which is the count of all shooting attempts. One of the big reasons why it was such a hot topic was its potential in predicting future success. However, anyone has seen a hockey game knows full well that a shot from the blueline is less likely to go in than a shot from the slot. We know that defenses protect the “high danger areas” and offenses work to get opportunities there. How do we account for those kinds of shots? There are two ways to do so: scoring chances and expected goals.
What are Scoring Chances?
Scoring chances are unblocked shots from the crease out to the faceoff dots in the circles and up to the tops of those circles. It includes the high slot, the slot, the area around the crease, and the inner halves of both circles. Here is the region on the ice. Any unblocked shot inside the black line is a scoring chance and any unblocked shot outside of it is not.
It is technically true that any shot taken has a chance to go in the net. However, not all shots should be considered the same as scoring chances. Those taken outside of this highlighted region tend to get stopped assuming they even get to the net at all. Either the distance or the angle in the areas around this zone are not favorable for shooters. This past season, the vast majority of goalies stopped shots outside of this region at a range of 95 to 98% per Natural Stat Trick. As for shots within that region, the vast majority of goalies stopped “medium danger” shots at a range of 85% to 94% and “high danger” shots at a range from 77% to 87%. The shots within the above region have become goals much, much, much more often than the ones outside of it. That supports why this region matters so much.
How is it Counted? And What is Danger?
A bit of history explains how we came to the current situation. Scoring chances were initially manually tracked. In fact, there was quite a bit of it at SB Nation including at this site. Trackers would count who was on the ice when a shot was taken in this region and confirm that the shot was in this region. Similar to Corsi, a player would receive credit for being present for scoring chances for and get punished for being present for scoring chances against. Between it being a manual process, the challenge of compiling data over multiple games, and the fact that Fenwick approximated scoring chances, this did not last. (My memory tells me Eric Tulsky found this and proved it on NHL Numbers earlier this decade but I cannot find a link to back that up. Sorry.) The counting of scoring chances died pretty quickly.
However, the stat was revitalized with the introduction of War on Ice. The fine people at the hockey stat resource - Dr. Andrew Thomas, Sam Ventura, and (later) Alexandra Mandrycky - found a way to remove the manual nature of tracking it and strengthened the concept based on research in the field. They found that in the metadata of the NHL play by play logs, there were (x,y) coordinates for each shot. We see it in the log as just a distance to the goal. They were able to figure out how to automatically count up the chances taken by a player and on a team. They basically used the same region as has been used for scoring chances, although it was presented slightly differently with a longer region for the high slot.
At the end of 2014, War on Ice introduced danger levels to better describe scoring chances. Low danger chances would only be counted for unblocked rebounds and shots off the rush outside of the scoring chance region. Medium danger chances would be all unblocked shots in the scoring chance region outside of the slot and the crease. High danger chances would be all shot attempts in the slot and the crease. This has been since been refined to the following definitions of danger at War on Ice in 2015. They have been solid enough that they are still used to this day at Natural Stat Trick.
- Every shot attempt in the offensive zone is given a rating.
- All shots outside of the scoring chance area gets a value of 1. All attempts in the scoring chance area but outside of the slot and crease areas gets a value of 2. All attempts in the slot and crease area gets a value of 3.
- If the attempt was blocked, then the value is reduced by 1.
- If the attempt was taken off a rebound or off the rush, then the value is added by 1.
- Any attempt with a value of at least 2 is a scoring chance.
- An attempt with a value of 3 or higher is a high danger scoring chance.
- An attempt with a value of exactly 2 is a medium danger scoring chance.
- Any attempt with a value of 1 or 0 is a low danger scoring chance and is not included in their Scoring Chance stat.
The purpose of adding danger is to reflect on the reality that the shots taken in the slot and in the crease area are more likely to go in than ones taken in the high slot or from the inner circles. While they are more likely to go in that shots from outside of the zone, the prime real estate on the ice in the space between the circles to the crease itself. This is what we see penalty kills primarily focus on protecting. This is what we see players on defense focusing on in the run of play. This is why some teams overemphasize players who are “adept” in playing around the net.
Applying the concept of danger to scoring chances makes it more meaningful to what we see in the game. It really helps determine the quality of the shot that the we see while watching the games and, like Corsi, we can see what kinds of chances players and teams are generating and which chances players are on the ice for and against their team.
Scoring chances’ common abbreviation is SC and is also sometimes referred to just as chances. They are presented similar to Corsi with for and against states, percentage for-stats, and with all of the same contextual considerations used with Corsi rates (e.g. 50% SCF is break-even, meaning your team and the opposition took the same number or rate of chances; be careful on comparing rates between teams). For danger, they are abbreviated based on their level: LDC for Low Danger Chances, MDC for Medium Danger Chances, and HDC for High Danger Chances.
Rebounds and Rushes
The War on Ice definitions for danger that were later refined developed two assumptions for two types of shots that tend to lead to goals: rebounds and shots off the rush. Rebounds are self-explanatory. Someone takes a shot, the goalie makes a save, and because of physics, the puck rebounds out into space. It is common that a goalie is not going to be in a good position to make a save on a rebound so as long as the attacking team can A) find the rebound and B) put the rebound in the right location, that can lead to goals. Shots of the rush include breakaways, odd-man rushes like 2-on-1s or 3-on-1s, and even-man rushes like 2-on-2s or 1-on-1s. As the play transitioned quickly onto offense, the attacking team has more space than usual and has the defense on their heels, which makes it harder to defend. It is also harder for the goalie as they have to focus on the puck carrier, consider any options, and not show any signs of a move to give the potential shooter an idea on what the goalie wants to do. While we may lament players who do not finish shots on the rush - Miles Wood comes to mind - these do tend to a lot of goals as well.
The issue here is that there is no event in the play by play log for rebounds or shots of the rush. Neither log that is posted with each game at NHL.com or the metadata for that log indicates whether a shot was a breakaway, off on an odd-man rush, or even a rebound. Since we know these are plays that yield plenty of goals and plays teams strive to create, they cannot be ignored. The old hockey cliche of “get pucks to the net” absolutely applies. It is absolutely something players do in games whether it comes throwing anything from a low danger area in the hopes of generating a high danger opportunity or just trying to work the puck in front and going for a “jam play” on the goalie. The trio at War on Ice came up with the following assumptions:
- A shot off the rush is any shooting attempt taken within 4 seconds after an event in the neutral or defensive zone without a stoppage in play.
- A rebound is any shooting attempt taken within 3 seconds after a shooting attempt without a stoppage in play.
In practice, these assumptions may not hold up. (You can consider this a drawback for the stat.) In the NHL, a lot can happen in 3 seconds after a shot is taken. While it would include shots taken right a save is made, it could also include a rebound corralled by the offense and passed to someone else for a shot. Or it could include a defensive player trying to clear the puck, get stopped, and then the offense fires away. Likewise, this definition of a shot off the rush does not consider the situation on the ice. It is one thing to see a blocked shot by the defense lead to a favorable bounce that a player takes and goes off on a breakaway. It is another to see a blocked shot by the defense lead to a change in possession - only for a soft shot to be taken because the previously attacking team backchecked hard enough to turn a potential 2-on-1 or 3-on-2 into a 3-on-3 or 3-on-4. Those possibilities can happen in a game but they may not be as frequent as an actual rebound try or shot off the rush. They are wide definitions but given the limitations in the data, it is the best thing going without throwing out too many events that would legitimately fit those definitions. Just know that the current assumptions used are not absolutely perfect.
The benefit of scoring chances with danger is that it allows us to make some clearer conclusions about how a team performs and where they can make improvements.
For example, the 2019-20 New Jersey Devils were not that bad at defending the high danger areas on the ice. According to Natural Stat Trick’s 5-on-5 data, the Devils’ high danger chance rate was 10.67 per 60 minutes. That was the 15th lowest rate in the league last season. That is around the league median. We can call that decent. However, their rate of allowing scoring chances - which includes those median danger areas around the slot and crease - was 29.81 per 60 minutes. That was the 29th lowest rate, or the third highest rate, in the league. That is not only terrible, but it points to the real weak point of the Devils’ defense last season. They may have prioritized defending the slot, the crease, and things like rebounds and rush shots. But they allowed a lot of good opportunities within the scoring chance zone outside of those areas. They may not be as dangerous, but the volume of them definitely caught up to the Devils. When fans say the Devils’ defense was bad and/or needs a lot of improvement, the rates of scoring chances against the Devils supports that.
Additionally, the 2019-20 Devils were not good at scoring goals last season. In 5-on-5 situations, they scored just 2.27 goals per 60 minutes - the seventh fewest rate in the league per Natural Stat Trick. Trading Taylor Hall and Blake Coleman during the season certainly did not help. But the issue lies deeper than that. The Devils generated just 24.05 scoring chances per 60 minutes and 9.95 high danger scoring chances per 60 minutes. The scoring chance rate ranked as the third lowest in the league and the high danger chance rate ranked 24th out of 31 teams. Moving Hall and Coleman did not help, but the squad as a whole had problems generating offense. It speaks to issues with the tactics, the offensive philosophies, and concepts by the coaches in addition to the personnel. It could even be seen as a reason why Hynes was dumped, Nasreddine was kept only as an assistant, and the team reached out to Lindy Ruff and Mark Recchi to join the organization. That is speculation on my end, but the data would support such a personnel move.
That said, trading Hall and Coleman did hurt the cause for the Devils generating scoring chances. They were first and second on the team last season in terms of individual scoring chances taken per sixty minutes. This a rate of how many scoring chances - attempts in the scoring chance areas - that the player took on their own. Interestingly enough, the third best generator of scoring chances in 5-on-5 situations last season was Miles Wood at 7.99 per 60 minutes and took the second most total scoring chances last season with 108. If your memory of Wood involves him getting to good areas to shoot and getting opportunities off the rush, then this stat to supports that. If nothing else, he can generate opportunities for himself. Unfortunately, Wood’s finishing left a lot to be desired as he scored just nine goals in 5-on-5 hockey last season. So your memory of Wood getting good opportunities to score and not scoring on a lot of them is also supported by data.
Another Method for Shot Quality: Expected Goals
While scoring chances with danger were being implemented, one of the drawbacks is that it bins shots by location and relies on those locations and other assumptions to determine danger. Shot location is important but it is not everything. There are other factors that could come into play to be accounted for such as the type of shot and past history of scoring from those locations. And the defined areas could be argued to have some fuzziness on the edges. After all, have we not seen Alex Ovechkin and Kyle Palmieri be lethal from one-timers by the right dot on power plays? Even if they are a foot or two behind the dot and technically not in the scoring chance area, we would think they are fairly good places to shoot from. They have had enough success to want to keep seeing it. Rather than not give those shots credit because they were technically outside the area, another method would better reflect their value. We can do this with an Expected Goals model.
The main idea of the model is that every shot taken has a probability to score based on the kind of shot taken, where it is taken, and other factors. This probability is called an expected goal value. You can sum up all of the expected goal values taken in a game by a team and that can be a team’s expected goal value for the game. While the score may not accurately reflect this value, it does provide an idea of how well each team has done at generating opportunities to score. The model is just that: a model. It cannot predict the goalie having a bad night, a shooter getting all of the bounces, flukes, and so forth. But over a long period of time, the model can sort out who has been good (or bad) at generating goal scoring opportunities and which players have been good (or bad) at it too.
This value is often referred to as xG. xGF means expected goals for and xGA means expected goals against. Similar to Corsi and scoring chances, you will see this commonly represented as for or against values either as raw counts or rates per sixty minutes. Additionally, you’ll see xGF%, which is the percentage of expected goals for over all expected goals. And all of the other contextual considerations from Corsi also applies (e.g. beware of comparing xGF%s of players between teams).
You can see this as a step up from Corsi in a way. More than just measuring the value of each shot, you could interpret this as a way to evaluate how a team is performing. Corsi focuses on the count of shooting attempts. Expected goals will reward the teams that take shots in more advantageous locations and more dangerous shots (e.g. one-timers are better than backhanders). A drawback of Corsi is that a team could “inflate” their Corsi by taking loads of shots from the point or half-wall. Those are shots opposing defenses will allow all day long because they are not that dangerous. An expected goals model would not reward that as it would assign those shots lower values. Expected goals model does not suffer that possibility. We can look at the xG of a game and note that the team with an xG of 3.3 likely out-performed the opposition with an xG of 1.8 While a team cannot win a game 3.3 to 1.8, we can at least come to that conclusion with the model. And it may tell us something that Corsi or Fenwick or simple shots on net would not.
Of course, expected goals models are not perfect. They are created to reflect reality in mind but there are only so many variables and datapoints available. The model makers do try their best to at least provide a theoretical idea of what should happen. The good news is that a model can be modified, refined, and updated. If it is mistaken, then we re-adjust with new data and findings. Which has been happening in its own way within the last ten years.
Expected goals models in hockey have gone as far back as 2004, but it really picked up steam in the public sphere in 2015 when Dawson “Don’t Tell Me About Heart” Springings and Asmae Touri came out with this model at Hockey-Graphs. This development led others in the analytics community to come up with their own models. Dr. Micah Blake McCurdy has his model at HockeyViz, which he recently updated. (Aside: If you want to delve into the details of what goes into a model like this, Dr. McCurdy’s post is excellent as he laid everything out) Peter Tanner at Moneypuck has his own model. Brad at Natural Stat Trick has his own model, which is included on all of his major stat pages. Manny Perry’s Corsica had his own version of the model. The Youngren twins at Evolving Hockey have one too. What this means is that the values may differ a bit from site to site as their models may weigh certain data points more than others. All the same, it is a way to measure the team’s process - just like Corsi does with CF%.
In general, it is best to take a larger message from them. Rather than get hung up on whether the Devils’ 5-on-5 xGF% at Natural Stat Trick is more accurate at 46.85% or if Moneypuck’s 5-on-5 xGF% of 46.94% is more true to what they had, the larger message is that the Devils were really bad by an expected goals model. They were a bottom-five team in the league in this stat. And this is supported by their actual goals for percentage rate, which was also among the lowest in the league at 43.88%. This is another benefit of the expected goals model. We have an idea of what the Devils’ offense and defense could have resulted in a theoretical sense. If the actual goals scored and allowed differs a lot, then that may tells us something about the team’s process or other something else being an issue, like goaltending. Clearly, the 2019-20 Devils were not a very good team in either expected or actual goals.
Like Corsi, scoring chances and expected goals are best used for 5-on-5 of situations. However, I recommending more on the preferred rate of each for special teams. The for-rate for power plays and the against-rate for penalty kills. Also like Corsi, I would also recommend taking context into account for players.
Given the extended offseason, there has been no shortage of attention paid to Jack Hughes. He was the first overall pick in 2019 after a record-breaking season with one of the most talented United States National Team Development Program squads in recent memory. But he did not perform well in 2019-20. Should we be concerned? This is debatable, but the expected goals model at Natural Stat Trick gives me reason to think that we should not be so worried. Hughes put up an individual expected goals value of 8.69 in 5-on-5 play. You cannot score 0.69 of a goal no matter how nice that looks, but the point is that the model suggests he should have had 8 or 9 goals based on where he was taking shooting attempts. Hughes was trying to create opportunities. But his finishing (and the finishing of others) betrayed him. Hughes just had 2 goals in 5-on-5 hockey last season. The young man’s shooting percentage was a very low 2.38%. As the kids would say these days: Oof.
However, the encouraging part is that he generated enough attempts and scoring chances on his own to put up the fifth best individual expected goals value on the team behind Coleman (since traded), Wood, Palmieri, and Nico Hischier. Given his movement around the lineup along with the head coach and GM being fired as the team went to playoff hopefuls to lottery ball players within three months, coming in fifth on a NHL in your 18-year old season is pretty good. Even if his goal scoring did not match what the model suggested. Being able to create opportunities to score may be much more repeatable than scoring a bunch of goals from not-so-good locations. Hughes demonstrated that he can create opportunities for himself, much less others. If it was the other way around where Hughes out-performed his individual xG, then I would be a bit more concerned for his scoring in the near-future. (It is also really hard to be a forward and score on fewer than 3% of your own shots too.)
There was a young Devil who did greatly outperform his individual xG last season, though. Jesper Bratt scored 14 goals in 5-on-5 play, second only to Palmieri’s 16. His individual xG was just 6.85. While he took almost as many shooting attempts as Hughes, Bratt had fewer scoring chances and high danger chances than Hughes. This suggests that Bratt had very favorable shooting last season and scored a bunch of goals that maybe he would not have if he took the same shots in another season. I am concerned that Bratt’s goal scoring may drop significantly unless he is able generate more chances in 2020-21. (Or at least a rate of goals in 2020-21 since it is not likely 2020-21 will be an 82 or 72 or even 60 game season.) Hopefully, Tom Fitzgerald understands all of this as he still has to re-sign Bratt to a new contract and set new expectations for the young winger. Assuming this will repeat may blow up in his face.
Let us turn to a defenseman. About a month ago, the Devils signed Dmitry Kulikov. One of the criticisms for the signing was because he had such a low expected goals for percentage in 5-on-5 play, he would not really be able to help out on defense. How can he with an expected goals for percentage of 44.3% per Natural Stat Trick? However, this is where context matters. The 2019-20 Winnipeg Jets had the lowest expected goals for percentage in the entire league as they allowed a lot of scoring chances. Kulikov’s xGA/60 of 2.45 was around the median among all Jets skaters last season. His low percentage is a function of that with an incredibly low xGF/60 of 1.95 - which was also far from the worst among Jets players. That value suggests Kulikov is not meant for offense and his teammates certainly were not making it happen either. The model shows that we should pump the breaks on hoping Kulikov can really stabilize the blueline, but the context also shows that Fitzgerald did not just sign a pylon to a one-season contract. The context also that if you thought New Jersey’s defense was a problem last season, then Winnipeg showed last season that it could always be worse.
A Note of Caution for Using Expected Goals to Make Predictions
The push for scoring chances and especially expected goals came in the form of predicting future goals. When Springings and Toumi released their model, it came with the understanding it could predict future goals better than Corsi. And as expected goals models are driven heavily by shot location, it made some sense to consider scoring chances as more valuable information than mere shot attempts or shots alone. However, their predictive value
Earlier this month, draglikepull compared and contrasted three expected goal models against Corsi For% to determine which is better at predicting future goals. One of the big driver for all of analytics is to find which stats are predictive. Knowing a team or a player is good in a stat could provide an edge for a team, for someone making bets or playing fantasy hockey, or for a fan who is hoping for success. However, despite the complexity in expected goals models, DLP found that a more simple stat like Corsi is better at predicting future goals. By no means are the correlation factors high for either of them, but CF% had a higher one than either of the models DLP used. You can view the whole post for the details but it does take some of the value out of using xG - at least for making predictions. The “old” standard of Corsi apparently can do that job a little bit better. As a means to evaluate a team’s performance, it is still something to use.
DLP did revise his recent findings. DLP found scoring chances - yes, the same discussed in this post - are a little more predictive in recent seasons than Corsi. His original conclusion remains. However, Peter of Moneypuck noted in a response on Twitter that he found expected goals models are more predictive for future wins if you take out rebounds. DLP noted that he was looking at goals and not wins, but you need goals for wins so it is an interesting interjection. Why would taking out rebounds make a model more predictive? It could go back to that assumption made with scoring chances about how rebounds are counted. Yet, scoring chances follow that same assumption and DLP found that scoring chances are more predictive. It could go back to how Peter’s own model handles rebounds, which could vary from model to model.
This is not to say that every model should be ignored or that expected goals is a farce. Again, models can be improved and adjusted as more data is collected. We should take them for what they are and primarily use them as tools for evaluation than necessarily prediction. That is at least my takeaway.
What’s Next & Your Take
All shots are not created equal and the stats of scoring chances and expected goals helps clarify and measure them. By and large, where a player shoots the puck is going to matter a whole lot. So if nothing else, if your favorite player or team is not scoring a whole lot, then hope they create more dangerous opportunities in addition to more opportunities in general. Again, the old cliche is not wrong: try to get pucks to the net and hope for a bounce.
Of course, anyone who watches hockey knows that skaters are hardly individuals on an island. Their linemates matter. Their opposition matters. How the puck moves matters. We see coaches switch up lines or make specific matchups in games. We see teams play differently from unit to unit. How is all of that accounted for? Are they counted at all? Next time in this series, we will look at Teammates, Competition, and Passing.
In the meantime, I would like to know what you think and what you have learned about scoring chances and expected goals. Please provide any further questions about these stats in the comments. Thank you for reading.