Yesterday, I began a closer look at the concept of scorer bias. I looked at the concept in detail way back in 2010. I concluded then that, yes, it was real; yes, it was really happening at the Rock; and yes, it brings into some doubt about of the stats collected by the NHL and, by extension, used by others. I decided to re-visit the topic to see what the bias may be at the Prudential Center and in the entire NHL. In Part 1, I came to similar conclusions about giveaways, takeaways, and hits. In Part 2, I will focus on the other “super stats” blocked shots and missed shots. For completion’s sake, this also includes shots on net.
I referenced in Part 1 this influential 2009 post at Hockey Prospectus by the influential Tom Awad. He noted that because of scorer bias, a lot of the work built on the stats collected by the league is suspect to a fault. This is not to conclude that it is worthless or a waste of time. Just that there is an inherent flaw. The concepts and processes themselves are sound; its just that the data that goes into them may lead to results we do not expect. If it’s wrong, it is at least consistently wrong enough such that the process will work if bias is corrected. As Awad put it:
The good news is that, since shots at both ends of the ice are affected, shot differential metrics, such as Corsi, are almost untouched. Player shooting percentages and shots on goal, while interesting statistics, are rarely ever used to judge players, with good reason: because a player controls both how many shots he takes and how well he takes them, it is rarely useful to rank players by these metrics, except when looking for outliers whose stats are likely to mean regress.
Still, we should at least see whether the components of Corsi - shooting attempts - are still impacted by scorer bias. Awad’s essay focused on how the Devils undercount shots. In Part 1, I showed that some of the counting of stats did change last season compared to the prior four seasons. Maybe that has held true for missed, blocked, and actual shots. Let’s find out.
As with Part 1, all data is from Corsica. The filters used for the team stats were all-situation play in the regular season from the last five seasons.
I defer to Kent Wilson, who gave the best quote about shot blocking I have ever read.
Blocking shots is like killing rats. Doing it is preferable to not, but if you’re doing it all the time it suggests you have bigger problems— Kent Wilson (@Kent_Wilson) March 18, 2015
So how many blocks did the Devils get credit for at the Rock and on the road?
Well, the Devils and their opponents in games played in Newark, New Jersey certainly have blocked much fewer shots than they did away from there. By at least 100 and often times much more than that in each of the last five seasons. Seriously, the percent difference is massive; peaking in 2013-14 for the Devils and 2015-16 for opponents. Again, I can understand a slight difference due to variation of the game. But when the difference is this large, a table like this strongly suggests that the Devils’ scorer undercounts blocks. It appears that it improved somewhat in 2017-18, but it is still there. It is not like the Devils or their opponents intentionally try to sell out their bodies for blocks away from the Rock. There would be no reason to do so.
So the Devils’ scorer seemingly undercounts blocks. How does that compare with the league?
Surprisingly, it is not so stark in the league. Yes, there is less variation for blocks counted for away games compared to home games. That points to some bias for home counts compared to away counts. But the difference in mean counts between home and away has not been large for each of the last five seasons. The largest difference is from 2013-14 and that’s only by 34.3 blocks over 82 games.
Based on this table, the Devils’ counts for blocks for and against them in the Prudential Center have consistently been below at least one standard deviation less than the league mean. The count of blocks against was below two standard deviations from the league mean in 2013-14 and 2015-16. It’s even worse for blocks by the Devils at home: they have been below two standard deviations from the league mean in each of the last five seasons. This is further evidence that the scorer at the Rock has been undercounting blocks.
To be fair, blocks are not always considered to be positive or negative events. Sure, in a moment or on one play, a block can make a difference between a goal against or not. But usually, a block is just that. To that end, Fenwick was devised; they represent unblocked shooting attempts came about Fenwick. Or: shots on net and missed shots only. A missed shot sounds easy to understand but in practice they may be difficult to identify. Is a shot that was deflected that missed the net by a foot a missed shot? What about an attempt to pass the puck that went by the net? What about an attempt that went so bad and threatened nobody to scorer, it went into the protective netting above the glass? I do not know the true answers. In any case, here is how missed shots were counted for and against the Devils at both the Rock and away from the Rock:
It is not as stark as it was for blocked shots, but there were signs of undercounting at the Rock. From 2013-14 to 2015-16, that appeared to be the case for the Devils’ missed shots as well as their opponents. In fact, the lean towards fewer missed shots counted at the Rock continued for Devils opponents into 2016-17 too. But there appeared to be a correction of sorts. The counts between home and road games for the Devils in the last two seasons were close. The difference between road and home missed shot counts for Devils opponents was also reduced. This has resulted in a five-season high in missed shots counted at the Rock and in total for the Devils in 2017-18. That the opponents had 65 more misses counted away from the Rock makes me think there could still be some bias; but it appears to be reduced compared to seasons before the last one.
How did the league do with missed shots? The breakdown is similar to blocked shots.
Just like blocked shots, the mean counts for misses at home and the mean counts for misses on the road were not that far apart from each other over the last five seasons. It was in the range of 32 to 47, so there was some difference. Over an 82 game season, that still is not a lot. Also just like blocked shots, the variation for home counts is larger than the ones on the road. That does point to bias in some rinks; but it is not as massive as it was for giveaways and takeaways.
The New Jersey Devils’ scorer below the league mean by at least one standard deviation from 2013-14 to 2015-16. Again, whatever change was made (personnel? recording? definition/approach?) led to being within the mean by the standard deviation for home counts. That was the case for missed shots against the Devils at the Rock in 2016-17 and 2017-18 as well as missed shots by the Devils at home in 2017-18. Good on the Devils’ scorer for that adjustment as it appears that the bias has been reduced in that regard.
Shots on Net
Of course, blocks and misses are just that. Attempts to get pucks on target that did not work. The ones that matter most are the ones that do get on target. They are the ones goalies must stop. They are the ones we want defenses to supress and limit. They are the ones we want offenses to create and often as possible. They are the ones that can become goals. Again, in Awad’s post in 2009, he showed that the Devils were notorious in how they undercounted actual shot totals. What about within the last five years?
There’s progress! Of sorts!
The good news is that the Devils’ own shot counts at home have not been too much different from their shot counts on the road. The difference between road and home counts is small enough to believe that the count at the Rock may be legitimate. This suggests that the home scorer is at least generally consistent with their peers throughout the league.
The bad news is that it is only in one direction. The shot counts against the Devils at the Rock have been consistently less than what they are on the road. Before suggesting that is a result of how the team plays on the road compared to how they do at home, the difference was as much as 214 shots over the last five seasons (last season was the largest home-away difference). That difference represents an average of 5.21 shots per game over 41 games. The lowest such average over the last five seasons was 2.41 per game. That’s not something to just ignore. This aspect strongly suggests that the undercounting of shots is still occurring at the Rock. It is just not in favor of the Devils.
To put this in another perspective, according to Corsica’s all-situation stats, the Devils had the sixth fewest shots against total in the whole NHL at home with 1,201 and the third most shots against total in the whole NHL on the road with 1,415. The scorer bias of 30 other rinks is not going leave as much impact as the Devils’ own scorer, where the Devils played 41 games last season. Again, there are some differences between playing on the road and at home. However, I do not think those differences alone would cause a team to be one of the best shot suppressors at home and one of the worst away from home. I have to again suspect how shots are counted.
What about the league as a whole?
For shots for, the home counts had higher means and standard deviations from road counts. While that is not a surprise, the gap in means is a bit larger than what it was for missed and blocked shots. The standard deviations for road counts, while still less than home counts, are also higher than they were for those two stats. This is suggestive of biases at play at other rinks. That they were reduced on both sides over time suggests some possible improvements with consistency.
The Devils’ counts for shots for were well below the league average by at least one standard deviation from 2013 to 2017. Of course, those four teams were also not that good and did not shoot the puck all that much either. The counting can only make up for so much. But that has been back in “control” in 2017-18, which not-so-coincidentally had a better team.
For shots against, while the means are appropriately flipped, the standard deviations did differ. It’s still higher for home counts with near-exception of 2013-14. That they were also reduced on both sides suggests improvements in consistency. However, that was not consistent as it was reduced the most in 2015-16 and then went up a bit in 2016-17 and 2017-18. Maybe it will get to that level in future seasons?
The Devils’ counts for shots against were below the league average by at least one standard deviation from 2013-14 to 2015-16. The counts were within the league mean by a standard deviation in each of the last two seasons, though it was on the lower end. It still does not really help the notion of scorer bias given the big difference in home and road counts for the Devils’ opponents.
So What Does This All Mean?
The scorer bias in the NHL does not seem to be as pronounced for shots on net, missed shots, and blocked shots compared with giveaways, takeaways, and hits. With shots and blocks, it is usually easy to define. Misses have more of a grey area; but it is not something that can be easily misinterpreted or not identified like a non-obvious giveaway, takeaway, or hit. There does appear to be a scorer bias in favor of counting more in favor of the home, but it is not as large as the other three. It is different from rink to rink.
The New Jersey Devils’ own scorer has been guilty of undercounting these three stats up until recently. There appears to be some adjustments to close the gap that have been made, specifically for home counts of shots on net by the Devils. But there are still notable differences in counts between road and home counts for and against the Devils for these three stats. Especially with shot counts of the Devils’ opponents at home.
The impact of this sort of scorer bias is large. Awad pointed to its impact on the analytics at the time, which were based on shooting attempts. That the counts could be so much different lends one to wonder what is true. Is the Devils’ scorer really that stingy, or is their definition logically more narrow than others? Does this mean the Devils’ defensive performances or their overall performances in the run of play at home are not as good as the numbers suggest? It could be something as simple as questioning what actually happened in a game. A game could end with the Devils out-shooting their opponents 31-29 at the Rock. Is that number legitimate? Did the scorer just not see or did not count two shots for the opponent? At the game, maybe I could see that for either side; but I’m not going to recall that so easily well after it ended. I take the scorer’s count as evidence of what happened. Even if I were to doubt it due to what I do (or do not) know about scorer bias, I have no proof otherwise or means to account for it. We have to take the stats as it is and hope the scorer got it right. That the Devils’ home scorer differed from the all of the road scorers for quite a few of these stats as well as the NHL average in some cases means that’s a big hope.
This is ultimately why I still agree with Awad’s conclusion. No, this does not mean we should jettison Corsi, Fenwick, shot counts, WAR, or expected goals. (Aside: if the scorer is getting countable stats wrong, how are they with shot location - which is a big determining factor for scoring chances as the expected goals model?) Most of those have a logic and a process that makes sense. They cannot necessarily make up for the data. I’d like to see adjustments developed such that we do not rely solely on road data (that was a thing for a bit of time) or just take the data as-is and hope for the best. As with any system, questionable data put in will output questionable results. But that doesn’t mean we should disregard the system entirely.
How can we make adjustments? I do not know. In theory, one could establish something as a baseline and work off of that. Of course, setting such a baseline up would likely be a theoretical exercise and not necessarily based on reality. Counting shots and other stats independently of the team could work; but that requires a lot of time, energy, effort, and common definitions. There was a movement to do that for scoring chances; but even with a defined way of doing it, it was not easy to do and that movement died out in about 2012 at about the time Eric Tulsky posted this at NHL Numbers. Chances are now driven by shot location, which is in the metadata of the NHL play by play log anyway. Are adjustments possible? Yes. It appears they can be done based on some of the shifts in counts for the Devils scorer - especially within this post. But without any idea of what was it or how it was done, it is not clear how to do so in a reasonable fashion. And I do not know what would drive a change at all. Is it at the team-level? The league? The analytics people on teams? The fans? And what would be the incentive for change? other than have more confidence in a dataset for analysis that is, for all intents and purposes, a niche part of hockey fandom?
Until then, I have to conclude what I thought back in 2010. Scorer bias is still very much a thing. It is so with the Devils albeit different from what it was back then or even what it was two to four seasons ago. It appears to be so within the league as well. And as such, there needs to be some concern as far as how much value we put in various stats - especially if models are going to be built around them. Shots and shot attempts are one thing; giveaways and takeaways are another. Either way, the castles are still built on sand.
Thanks to Corsica for having the data available and thank you for reading.