clock menu more-arrow no yes

Filed under:

Blake Snell, the Cy Young, and why we we have all these WARs

New, comments

JT broke down what all these WAR metrics are, but I ask why? Spoiler Alert: the answer is Blake Snell.

Kansas City Royals v Tampa Bay Rays Photo by Mike Ehrmann/Getty Images

For those who watched Wednesday night’s Rays game, they will undoubtedly recall Rays analyst, Brian Anderson (BA), going on an inning-long not-quite-rant questioning how WAR could have Blake Snell as the 10th-best pitcher in the American League.

Now, to BA’s credit, he admitted that he didn’t fully understand how WAR is calculated, but to his not-credit, he continued to rail against the statistic to the broad audience watching. I’ll be honest and say it rubbed me the wrong way (for all of 60 seconds).

However, once again to his credit, BA seems quite intrigued by the statistic, rather than entirely dismissive. When play-by-play man Dewayne Staats noted that there are different versions of WAR and Snell ranks much higher by some of them, BA noted that he would find it interesting to read an article breaking down why those differences exist.

Well, BA: Ask and ye shall receive (two whole articles worth)!

The History of WAR

WAR means wins above replacement. The theory behind the creation of the statistic comes from Bill James’ early work calculating player value. The intent of the stat is to place the entirety of a player’s value — his ability to reach base, his ability to hit for power, his defensive prowess, the positional value he brings to the team, and the context (league, park, etc.) in which he does all that — into one simple stat. The stat is a counting stat (not a rate stat such as batting average or OPS), and it is based around the idea of a theoretical “replacement-level player” who has the skill of the 26th man on the MLB roster. The line right between a big-league player and a minor leaguer. Any value above that “replacement level” is positive, any value below it is negative. A 2-win season is average. A 5-win season makes you right around an All-Star. An 8-win season has you in the MVP conversation. A 10-win season makes you Mike Trout.

Just from that definition, one can see the appeal of the metric. One can also see the danger. Creating a stat that is supposedly as all-encompassing as WAR would seem to create too simplistic an approach to baseball analysis. With great power comes great responsibility, and thankfully any sabermetric fan worth her/his salt is never going to rely simply on WAR to be the be-all and end-all in evaluating a player.

WAR is simply one tool (hold that thought) in the analytical baseball fan’s shed. No good car mechanic would rely solely on her impact wrench, no matter how freakin’ useful the WAR/impact wrench may be — but it’s a good tool and one worth exploring.

Now that we have some background, let’s tackle the elephant in the room when it comes to WAR: the three different versions.

Where we are today

While BA was initially ranting against WAR simply because it ranked Snell 10th in the AL, he was very right in wondering how such an amazing and useful statistic [his sarcasm added] could vary so vastly from website to website.

This is a question I have received from many saber-skeptic friends of mine, and it actually is a great question! It points to why no analytical baseball fan would ever suggest that we have “figured out the game,” as Jayson Werth and others may assume most statheads feel.

In fact, the fact that there are three main versions of the most popular all-encompassing stat on the baseball market just drives statistically-inclined baseball fans to do even more research because there is clearly so much more to be learned.

Tampa Bay Rays v New York Yankees Photo by Al Bello/Getty Images

JT Morgan took up the task of breaking down each version earlier today, and I encourage you to read his work if you’d like all the details behind the three, but I’d like to ask a different question: Not what makes up each of these stats, but why are there three different versions of wins above replacement at all?

Meet the contestants

The three main versions of WAR currently available to baseball fans are: Baseball-Reference WAR, FanGraphs WAR, and Baseball Prospectus WARP, and even if I’m encouraging you to read JT’s breakdown, we need a recap to move on.

Now, all of these websites feel the need to give 1,000-word definitions, and it’s something JT did not break down, so I’ll make note of them here.

We’ll just post a paragraph each, though, and you can feel free to click the link if the way you learn best is 1,000 word diatribes before moving on to some analysis.

Here is how Baseball-Reference defines their version, rWAR:

“At its most basic level, our pitching WAR calculation requires only overall Runs Allowed (both earned and unearned) and Innings Pitched. Since we are trying to measure the value of the pitcher’s performance to his team, we start with his runs allowed and then adjust that number to put the runs into a more accurate context.”

And here’s FanGraphs, fWAR:

“Calculating WAR for pitchers is conceptually straightforward, but there are many steps and a lot of notation to follow. Generally speaking, the first thing you need is some estimate of your pitcher’s value relative to league average. There are all sorts of different approaches to selecting this number. FanGraphs uses Fielding Independent Pitching (FIP), with a few adjustments, but you could use RA9, DRA, or any other metric related to pitcher performance. This post does not address the merits of choosing FIP for use in WAR.”

And finally Baseball Prospectus, which does not have a pitcher-specific WAR definition, but offers a general definition of their WARP:

Perhaps no sabermetric theory is more abstract than that of the replacement-level player. Essentially, replacement-level players are of a caliber so low that they are always available in the minor leagues because the players are well below major-league average. Prospectus’ definition of replacement level contends that a team full of such players would win a little over 50 games. This is a notable increase in replacement level from previous editions of Wins Above Replacement Player.

Now, if you’re like me, you’re eyes glazed right over those definitions. So, next, we will go by the Michael Scott Rule.

For those who don’t watch The Office, there is a scene in which Oscar, who works in finance, is explaining why they should spend their surplus to Michael. Michael can’t figure out Oscar’s big words, so he asks that Oscar “explain it to me like I’m five years old.”

It’s a perfect scene because it encapsulates how many of us feel when we are overwhelmed with a new idea — like sabermetrics. I personally use the Explain It To Me Like I’m Five reddit page more frequently than I care to admit.

The beauty of this rWAR (Baseball-Reference), fWAR (FanGraphs WAR), and WARP (Baseball Prospectus) comparison is that it is quite literally easy enough to explain to a five year old.

rWAR is based around ERA, fWAR is based around FIP, and WARP is based around DRA.

Everyone knows ERA, so we don’t need to give much of an explainer there, but FIP and DRA may be new to some of you. FIP is quite straightforward though. It is Fielding Independent Pitching. It is a stat that distills pitching down to strikeouts and walks, aka what the pitcher can most control, and it works on the assumption that outside of strikeouts and walks, a lot of what the pitcher allows is noise — or simply good or bad luck.

DRA is a lot fancier, and needs a lot more time to explain, but all you need to know for now is that it is Baseball Prospectus’ attempt to perfect ERA by including factors such as the quality of the opponent, etc. (Many people do indeed believe it is the best at doing so as well).

Distilled down to its simplest comparison:

  • rWAR is the pitcher’s value based on what actually happened on the scoreboard, full stop.
  • fWAR is the pitcher’s value based on the more theoretical idea of what “deserved” to happen on the scoreboard, with context of the environment.
  • WARP is the pitcher’s value based on what happened on the scoreboard, with context of many factors, including opponent, park, and many more.

That’s it.

Case in Point: Blake Snell

So for a bit of fun after all that work, let’s look at how these three compare in a real-life situation: the aforementioned Blake Snell, whose 2018 season started this whole conversation! Let’s look at him in the context of the best pitchers in the American League in 2018.

Here’s Snell ranking second among all AL pitchers in rWAR:

2018 rWAR AL Pitchers Leaderboard

Rank Pitcher Value Key stat (ERA)
Rank Pitcher Value Key stat (ERA)
1 Chris Sale 6.1 1.97
2 Trevor Bauer 5.9 2.22
3 Blake Snell 5.6 2.07
4 Corey Kluber 5.5 2.74
5 Luis Severino 4.6 3.28
6 Justin Verlander 4.4 2.65
7 Mike Clevinger 4.3 3.25
8 Gerrit Cole 4.2 2.73
9 Mike Fiers 3.9 3.21
10 Jose Berrios 3.5 3.69

Here’s us needing to extend the table to 11 (like Spinal Tap) just to see our leading man on the fWAR leaderboard:

2018 fWAR AL Pitchers Leaderboard

Rank Pitcher Value Key stat (FIP)
Rank Pitcher Value Key stat (FIP)
1 Chris Sale 6.1 1.95
2 Trevor Bauer 5.9 2.38
3 Gerrit Cole 5.4 2.57
4 Justin Verlander 4.5 3.16
5 Luis Severino 4.4 3.08
6 Corey Kluber 4.2 3.25
7 Carlos Carrasco 3.7 3.10
8 James Paxton 3.4 3.13
9 Mike Clevinger 3.4 3.49
10 Charlie Morton 3.0 3.45
11 Blake Snell 3.0 3.29

And here’s the Goldilocks of WAR, with Snell sitting comfortably in the middle of those two extremes, sitting in seventh in WARP:

2018 WARP AL Pitchers Leaderboard

Rank Pitcher Value Key stat (DRA)
Rank Pitcher Value Key stat (DRA)
1 Justin Verlander 5.65 2.41
2 Chris Sale 5.50 2.03
3 Gerrit Cole 5.46 2.42
4 Trevor Bauer 5.46 2.45
5 Corey Kluber 5.15 2.74
6 Luis Severino 4.53 2.83
7 Blake Snell 4.28 2.63
8 James Paxton 4.25 2.65
9 Carlos Carrasco 3.26 3.32
10 Dallas Keuchel 3.23 3.56

With these three different tables, we’re able to see just how different Snell’s 2018 value could be perceived by these different metrics to determine value.

It’s interesting to note that Snell is a bit of the case study for this comparison in 2018, as no other pitcher differs as greatly in the AL top rankings as Snell does.

Houston Astros v Tampa Bay Rays Photo by Julio Aguilar/Getty Images

Which WAR is best?

Personally, I see the comparison between these value metrics as a bit of a personality test, and the fun part is, you can stake your claim to one (I’m a rWAR man), and then debate the merits of each with friends and colleagues who may prefer fWAR or WARP.

We don’t have a simple answer as to which WAR is best, and while many anti-statistics folks would tell you that the fun of arguing about baseball has been demolished by the “Chernobyl of statistics” that we currently have (looking at you, Bill James!), this Snell case study is proof perfect that debates can still be had — even within the heady, statistical realm of baseball analysis.

Blake Snell’s Cy Young Application:

  • When you look at how Snell performed based on the runs that crossed home plate, gosh he easily looks like one of the best (that’s rWAR)!
  • When you look at how a pitcher should have performed based on the things he controls most (his strikeout rate and walk rate), Snell is good but not in the conversation for best (that’s fWAR).
  • And when you try to amalgamate both ideas, with a heavy dose of what the opponent is capable of, Snell is great but maybe not the best just yet (that’s WARP).

How varying is that comparison? Depending on your perspective, Blake Snell could be near, far, or right in the Cy Young conversation in 2018, and all three interpretations of his performance are fair.

This is why we have three different ways to grip the same stat, and it’s why we will all stay at war with WAR for the foreseeable future.