Evaluating Pitchers: The Wonders of DIPs
So far in this series, we've tackled luck, wOBA, and WAR, but now it's time we got into something a little more difficult - pitching. Before we get into the nitty-gritty of how to analyze pitchers, here's something I think all of us can agree on: evaluating pitchers is damn hard. Pitchers are some of the most fickle players in all of baseball, and their performances can vary wildly from year to year. A pitcher may be mediocre one year, but then bust out and become one of the best in all of baseball the next year (ahem, Cliff Lee, Randy Johnson). Some may burst onto the scene as perennial All-Stars, but flame out due to injuries and general ineffectiveness (cough, Scott Kazmir). Some may have one star season, but then struggle to ever recapture that glory and effectiveness (Dontrelle Willis and Oliver Perez, anyone?). Unlike position players, which have been shown to - on average - peak around age 27 and slowly decline afterwards, there's no set "aging curve" for pitchers. Some peak at 22, others peak at 32. It's friggin' insane.
And so, considering that we all know how volatile and unpredictable pitchers are, when evaluating a pitcher we want to find statistics that predict the future the best. How well should this pitcher perform going forward? That's the question we want answered and although no statistic - or even clump of statistics - is going to be even 80% accurate, we'll take whatever we can get.
In the late '90s, this guy called Vorus McCracken came up with a radical new idea: let's evaluate pitchers only on things they have direct control over. Instead of focusing on Wins - which depends upon the pitcher's team to score runs - and ERA - which depends upon the quality of the defense behind the pitcher - let's take a look at the outcomes that only involve the pitcher and batter: strikeouts, walks, and homeruns. How well do these three things predict future success for a pitcher? Surprisingly to all, including himself, they worked quite well and launched a new branch of baseball analysis: defense independent pitching (DIPs)*.
*Brief interlude: for those that are unfamiliar with the concept, you may be asking yourself, "Why do we need to separate pitchers from their fielders? Doesn't ERA account for fielding because it only counts earned runs?" Yes and no. ERA accounts for some fielding, disregarding runs that are the result of an Error in the field, but how many times have you seen Carl Crawford make a fantastic catch out in leftfield that turned a double into an out? If the Rays had any other outfielder out there in his place, those balls would fall in for hits and the Rays' starters would most likely allow more runs to score. Like umpires, defense is invisible to the casual fan unless its really, really good or really, really bad, but it can have a large impact on a pitcher's results.
These days, there are a number of DIP statistics, all of which are attempting to do the same thing: predict to the highest degree of success how a pitcher will perform in the future. Since there's no one perfect statistic out there, let's run through three of the most common ones: FIP, xFIP, and tRA.
Fielding Independent Pitching (FIP):
The golden standard for DIPs statistics, FIP uses McCracken's three variables - stikeouts, walks, and homeruns - to calculate what a player's ERA should have been over a given time period. It's scaled to look exactly like ERA, so it's easy to tell what's a good value and what isn't, but it's a much better predictor of future success than ERA. Does that mean it's perfect? No, but it's a step in the right direction and one of the best DIPs stats available at the moment.
Expected Fielding Independent Pitching (xFIP):
Whenever you see a tiny "x" before a statistic, that means the stat has been regressed to some degree. Since McCracken came up with his radical new idea, it's been shown that pitcher homerun rates are unstable as well. A pitcher may let up homeruns on 5% of his flyballs one year, but then let up homeruns on 15% of his flyballs the next year. There's no rhyme or reason to it, and all pitchers have their homerun rates fluctuate regardless of if they're high strikeout pitcher, induce lots of groundballs, or are one of the best in the league. For example, Roy Halladay let up homeruns on 10.6% of his flyballs last season, but is at only 8.9% this season; James Shields let up homeruns on 9.8% of his flyballs in 2008, but is at 14.3% this season. Since this statistic is volatile, xFIP says, "Screw this, we want stability!" and regresses every pitcher's homerun rate to league average (10.6%). This has been shown to have slightly more predictive value than FIP by itself.
True Earned Runs Allowed (tERA):
Oh boy, this is where things get interesting. tERA incorporates the same statistics as FIP - strikeouts, walks, and homeruns - but it goes a step further by also including batted ball statistics like line drive rate (LD%), flyball rate (FB%), and groundball rate (GB%). The theory behind it is simple: if a pitcher lets up lots of flyballs and line drives, they'd also be more likely to let up lots of homeruns and doubles, and that's bad. But if a pitcher makes batters hit groundballs all the time, they'll be more likely to get outs and be effective. Pitchers do have some control over their groundball and flyball rates, so we should include those in our calculations as well. tERA is also based on an ERA scale, making it easy to tell what's a good score and what's not.
***
Those are all very simplistic descriptions of the above statistics, so if you have any additional comments or questions, please let us know in the comments. Also feel free to check out the pages on FIP, xFIP, and tERA on The Sabermetrics Library.
25 comments
|
0 recs |
Do you like this story?
Comments
FIP and xFIP are referenced all of the time here
but I’ve never seen tERA discussed. Is there any particular reason why? It seems like a pretty good stat at first glance.
tERA is good
I think a big part of it is familiarity. More people know about FIP than tERA, and for a while tERA was only available in runs form (tRA) and was tougher for people to understand. I use FIP and xFIP primarily because I’m more comfortable with those two and I always forget about tERA.
I love Casey Fossum. Now try and take me seriously.
by Steve Slowinski on Jul 15, 2010 10:34 AM EDT up reply actions
The runs scale makes so much more sense though
by Graham MacAree on Jul 15, 2010 10:52 AM EDT up reply actions
It does make sense
However, people like to see it compared to FIP, ERA, xFIP, so its handy to have them all on the same scale. Its easier (lazier) to adjust one than three.
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 15, 2010 10:54 AM EDT up reply actions
I don't mind that much
Whatever works for people works
by Graham MacAree on Jul 15, 2010 11:00 AM EDT up reply actions
It is much better...much more accurate way of thinking about things too.
I think the public will catch up eventually. It takes some getting used to.
I love Casey Fossum. Now try and take me seriously.
by Steve Slowinski on Jul 15, 2010 11:03 AM EDT up reply actions
One reason is that tRA/tERA break out LD/GB/FB rates. And there really isn't much of a skill to LD rates.
tRA* is probably the best option
by Sky Kalkman on Jul 15, 2010 10:33 AM EDT up reply actions
Which is only available at statcorner as far as I know
Its also on the RA scale as opposed to ERA so quick cross comparisons can be a bit murky (though I’m sure a simple adjustment exists)
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 15, 2010 10:52 AM EDT up reply actions
Wouldn't it just be as easy as dividing the runs allowed by 9?
At least to get a ball park number.
by firemangreg on Jul 15, 2010 11:06 AM EDT up reply actions
Sure you would just figure out lg average RA and adjust the constant
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 15, 2010 11:12 AM EDT up reply actions
For tRA* right?
For tRA you wouldn’t even need to adjust it, would you?
by firemangreg on Jul 15, 2010 11:14 AM EDT up reply actions
I'm talking about comparing it w/ FIP, xFIP or ERA
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 15, 2010 11:20 AM EDT up reply actions
Multiply by .92 to go from RA to ERA scales.
Divide by .92 to go the other way.
by Sky Kalkman on Jul 15, 2010 11:16 AM EDT up reply actions
That's the general rule.
About 92% of runs are earned.
It’ll changed based on era, park, defense, and pitcher (groundball pitchers induce more errors). But I’ve yet to find an application where I care.
by Sky Kalkman on Jul 15, 2010 12:38 PM EDT up reply actions
There's a pretty important distinction between 'defense-independent' and 'predictive'
If you wanted to see how well a pitcher has actually played, you don’t really want to eliminate luck at all – you want to try to get rid of defense while keeping the luck part (line drives, home runs, etc) intact. This means using tools like FIP and tRA, and it’s where tRA really shines.
It’s when we get to predictive stats where we want to start regressing our numbers, and we should be regressing more or less everything. Strikeouts, walks, ground ball rate, home runs per non-ground ball, all of that.
Anyway, my point is that what people call DIPs really consists of two separate categories of statistics – the descriptive and the predictive, and I think people miss that sometimes.
by Graham MacAree on Jul 15, 2010 11:05 AM EDT reply actions
Interesting interpretation of FIP...
I’ve always viewed FIP as a “should have been” metric, being all-encompassing and similar in scope to ERA.
But Tango’s said it’s just a piece of the puzzle. It’s not trying to represent all of a pitcher’s production, just the K/BB/HR pieces. It’s simply a measure of those three things and is choosing to ignore other things like BABIP and sequencing, NOT claiming they aren’t skills. Sort of like looking at K/BB ratio. Nobody claims that is an all-encompassing measure, just a piece of the puzzle.
A quick and easy one at that
Thats its beauty, removes a lot of noise and is easy to calc
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 15, 2010 12:49 PM EDT up reply actions
Exactly...quick, easy,
and a good first piece of the puzzle to glance at before delving in deeper. Especially with pitchers, with them being so complicated, you can’t sum everything up with just one stat. It’s impossible.
I love Casey Fossum. Now try and take me seriously.
by Steve Slowinski on Jul 15, 2010 12:59 PM EDT up reply actions
Ive always wonder if it could be possible in just one stat.
What if we looked at trajectory of batted balls and their speed as well. Shouldn’t a pitcher who gets hit “harder” be worse than those who induce weak contact. Up to this point it seems that BABIP is discredited as simply “luck” for a pitcher. So we just remove it and focus on K/BB/HR. I believe there is more to the picture and given more data their could be a more accurate metric. Something must be said for “pitching to contact” as long as that contact is weak.
Once we get Hit F/x data, I have a feeling this type of analysis will explode.
For now, it’s in the realm of wishful thinking. I’d be awesome, but there’s only so much data available now.
I love Casey Fossum. Now try and take me seriously.
by Steve Slowinski on Jul 15, 2010 10:45 PM EDT up reply actions
That's my problem with Fangraph's pitcher WAR
It’s (as far as I know) solely based on FIP and innings pitched, which while useful, is only a piece of the puzzle. I feel like because of that, Fangraph’s pitcher WAR is shaky, and really shouldn’t be quoted as the gospel it seems to be in a lot of places.
by Matt Slowinski on Jul 15, 2010 6:13 PM EDT up reply actions
A handily available composite average of the three would be HELLA convenient
Sort of a “average pitching stat that incorporates the top 3”. We can call it APSTITT3.
Oh, and Wikipedia thinks his “name” is Voros. I’ll let you two fight it out.
Would you like to follow me on Twitter, Facebook, or my blog...well you can't.
by SagehenMacGyver47 on Jul 15, 2010 9:53 PM EDT reply actions























