Quarantining without hockey forces you to dig into the vault of article ideas.
I have been toying around with an idea for a little while and I figured now would be a good time to give it a go.
A big pet peeve of those in the hockey analytics community is the fact that we only have decent data going back to the 2007 season. This is when shot attempts and their locations started getting tracked (producing the Corsi and xG metrics, among others) and the shifts of every player started being recorded (producing ‘on-ice’ metrics like CF%Rel or RAPMs). A very common grip of this is that we will never know just how good Pavel Datsyuk would’ve been in his prime.
So this means that, before 2007: No Corsi. No RAPMs. No GAR.
I know, bleak world right? Worst news you’ve probably heard all week. Top 2, at least.
I’ve always thought that there was probably enough info in old statistics to get, at least, a good estimate for the WAR numbers. So I finally decided to give that a go. I scraped Hockey-Reference for all data that was available going back to 1983, and used Evolving-Hockey’s GAR metrics going back to 2007. I broke GAR down into 3 distinct categories: Offensive GAR (EVO + PPO), Defensive GAR (EVD + SHD), and Penalty GAR (Take + Draw). I tried to “predict” these measures using statistics available going back to 1983 — Age, Position (F/D), GP, G, A, PTS, +/-, PIM, EVG, PPG, SHG, EVA, PPA, SHA, EVP, PPP, SHP, S, S%, OPS, DPS, and OPS. TI added a stat of my own that we’ll call ‘Weighted +/-”. It’s basically Relative +/-, calculated by subtracting the GP-weighted average +/- for that team-season from the player’s. These are the factors that ended up getting included in the 3 GAR models.
OGAR: EVG, EVA, PPG, PPA, OPS, Shots, Age, and Position.
DGAR: SHP, DPS, OPS, Weighted +/-, and Position.
PGAR: PIM, OPS, GP, Age, and Position.
As you may expect, the offensive model did a better job of predicting GAR accurately than either of the other two. But it wasn’t sheer randomness either — we seem to have enough information to ‘explain’ over a quarter of the information contained in DGAR and almost half of the information contained in Penalty GAR. Below you’ll see the predicted GAR component — which I’m callined ‘Naive GAR (nGAR)’ because of the simplicity of the model and underlying metrics — graphed against the GAR components they aim to predict. Remember that this is only using Devils seasons and skaters.
There is clear signal in all three of the components and they are composed of almost entirely statistically significant components which makes these model an acceptable proxy for our purposes. For those interested, I also did out-of-sample testing and the r-squared terms were 0.63, 0.23, and 0.28, respectively. Not much changes there about how we should view these models, except maybe treating penalty gar as equally mediocre a proxy as defensive gar.
Using this combination of models, we can construct a Naive GAR for all seasons. When we do so, here are the top 15 skater seasons in Devils history.
There are a lot of pretty comparably excellent offensive seasons in Devils history, but Elias’s 96-point 2000-2001 season was a cut above the rest as he did so in a time during which the average number of goals per game was 2.76 — close to the lowest they’ve ever been and 0.2 goals per game less than Hall’s MVP season. But that’s not all that made it special. While putting up the best offensive season in franchise history, he also produced the 8th best defensive season in franchise history and the best of the 15 seasons shown. The only season in the top 15 that was even close was Elias’s 2003-04 season.
There is a fair amount of error associated with this information and simply showing rankings can be misleading as to how close the results really are. In order to honor that grey area, I produced the ridgeplots you see below. These represent the distributions of the results of 500 simulations built off the Naive GAR and the error associated with it. The following curves are meant to represent an 80% confidence interval.
There’s a fair amount of overlap in the majority of these seasons. The 2nd through 6th best seasons overlap quite a bit. As do the 7th through 15th. But Patty stands alone at the top virtually untouched. Apropos, for this, the day after St. Patrick’s Day.
Happy St. Patty’s day to our favorite Patty. And may Saint Patrick endow him with the Hall of Fame enshrinement he so rightly deserves.
~~~~~~~~~~~~~~~
What do you guys think? Is there a season I didn’t mention or list? Did the great defensive seasons get shafted by this model? Is Patty the best Devils forward of all time and was this the best season of all time? How’s your quarantine/sports hiatus going?
Thanks for reading and leave your thoughts in the comments section below.