Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: The End Of Sabanball: Details, Barbarians, And Precision

Random Thoughts On Statistics in Baseball

There have been lots of articles published within the last couple of weeks about statistics and their place in baseball.  Since we rely upon statistics in our analysis on this site so often, it's important for all of us - writers and readers alike - to have a good understanding of statistics as a field.  While it's important that we all have an understanding of what each individual statistic is showing us (like wOBA, ISO, FIP, RBI etc), it's also important that we understand statistics in general.  What do baseball statistics actually show us?  What do they mean?  How should we use them?  Why should we use them?  These are the big questions: the existential equivalent of, "What is Life?" to a sabermetrician.

After the jump I'm going to put down a couple of my own recent ponderings about statistics, but if nothing else, please check out the links below.  I've learned more about statistics from reading sabermetric articles than I ever did in any class, and these recent articles are all thought-provoking and insightful.  Read them, think about them, and feel free to post any questions you may have in the comments below.

Joe Posnanski - a great read on the Hall of Fame, and how statistics can be used to prove any number of different points.

Tom Tango - The Mike Silva Chronicles, questions from a saber-denier (already linked on DBR before, but amazing)

David Appelman - discussing stat saturation and "statistical scouting"

Dave Cameron - about the overlap between scouting and advanced statistics.  When you think about it, about half of the statistic categories on Fangraphs use scouting data.

Okay, and now for my haphazard thoughts on statistics.  My mind first started down this quasi-existential path a couple of weeks ago while reading the interview here on DRB with Erik Neander, Tampa Bay's Baseball Operations Assistant.  It was a great interview with lots of insight, but one quote in particular got me thinking (emphasis mine):

"E.N.: Blown saves are a useful description of an event that occurs during the course of a game, but like any statistic, they don't tell you everything about a player's performance.  It just depends on what you want to know.  If you're using blown saves to evaluate the quality of a reliever's performance, then they have the potential to be misleading, especially with relievers that aren't always used to record the last three outs of the game."

Star-divide

When you stop to think about it, statistics are a very odd sort of thing.  In short, they're an attempt to describe, through numbers, the actions and events that are happening on the playing field.  That's all that every single statistic is: a description of physical events.  With some statistics, like RBIs and Errors, it's easy to see their relation to physical events, and that's why reading a box score can give you a good basic understanding of what happened in a game.  It describes to you what you could have seen with your eyes at the game.  While books and newspaper articles capture moments through using words, baseball stats do the same thing while using numbers.

With some more advanced statistics, though, that connection becomes lost.  How exactly does wOBA relate to events happening on the field?  What about WAR?  I can't see a player physically add one WAR to their team, but I can see them hit a homerun.  I can see a player make an error, but I can't see a -3 play on the Dewan +/- scale (unless you're trained to do so).  However, all these advanced statistics are based on events happening on the field too, even though they're tougher to actually see.  wOBA is based on the hits and walks that a player accumulates, and UZR is based upon how many balls a player gets to and where they're located.

And that leads me to my second quote, from the DRB interview with James Click, the Rays' Coordinator of Baseball Operations:

"JC: Sometimes people like to draw a line between "advanced" and "regular" statistics, but I don't necessarily see things that way. The difference between different stats can often be explained by the question the metric is attempting to answer. At the end of the day, it's all information and you have to know exactly what that information is telling you and what it isn't."

There are two big things that I get from that quote.  Number one: in short, all statistics are the same underneath.  They all have their own particular strengths and weaknesses, and they all originate from the physical events that are happening on the field.  Different statistics are simply attempting to answer different questions, and so they measure the events on the field...differently.  And that leads to the second big point: you have to understand enough about the statistic to know what specific question it's attempting to answer.  For example, while Errors ask the question, "How many times does a player mess up while attempting to field a ball?", UZR asks, "How good is a player's range, arm strength, and error rate?"  Different questions, different answers, and it's easy to be confused by the more evaluative, less descriptive UZR stat if you don't understand that.

One way that I've been visualizing this concept recently is as a continuum, with descriptive statistics on one end and evaluative statistics on the other:

What I'm trying to show with this picture is that all statistics, no matter if they're considered "advanced" or "traditional", contain varying amounts of both descriptive and evaluative characteristics.  In the example on the chart, ERA is a very descriptive statistic - showing how many earned runs a pitcher let up during the course of a game - but is not very evaluative, meaning it varies much from year to year and is a poor measure of a pitcher's underlying talent level.  FIP, on the other hand, is based on on-field results and attempts to measure how well a pitcher performed, but also goes beyond mere on-field results and attempts to measure the underlying talent level of the pitcher.  It's less descriptive than ERA - what exactly does a 3.00 FIP equate to?  It's not as easy to say as it is with a 3.00 ERA - but it is more stable from year-to-year and a better evaluative tool.

I've been trying to determine how I'd rank of the major statistics on this continuum, but it's a lot tougher than I would have thought.  It may not be a straight line relationship, since certain statistics are very descriptive, but can also be quite predictive (like homeruns and K/BB rate).  Maybe something more like a scatter-plot would make more sense (note: locations of statistics relative to others within the same box not meant to be precise):

Anyway, where would you rank most of the major statistics on a plot like this?  Anything you disagree with?  I could keep on with this subject at great length - there have been multiple books written about baseball stats - but I merely wanted to explore a bit the connections between all baseball statistics.  Like James Click said, there is no such thing as "advanced" stats; there are simply lots of statistics and each of them answers a slightly different question.  There is nothing wrong with using RBIs or BA to make a point, just as long as you are using the statistics with a good understanding of their limitations, strengths, and the questions that they answer.  Any evaluation or discussion about a player, though, requires multiple statistics and looking at the whole picture, since individual stats can only tell you so much. 

All of that said, if you have a question about any statistic that you don't understand entirely, leave a note in the comments and we'll be happy to help.  There are a ton of stats out there and it can be tough to get an understanding of what they all measure and what they don't, so don't feel dumb.  It's a learning process, and one that I know I'm still working on myself.

Comment 19 comments  |  3 recs  | 

Do you like this story?

Comments

Display:

Great Idea Steve

I’ve tinkered with the idea of an open Stats Q&A for a long time. We reference stats all the time without explaining them each time. We have the stats guide, but that still leaves plenty of room for questions, learning opportunities, and discussion. I ask new questions all the time. No question about how to calculate or apply stats is dumb. Hopefully this forum can serve as a chance to make us all wiser.

Follow Me on Twitter @FreeZorilla

by FreeZorilla on Jan 10, 2010 8:25 AM EST reply actions  

Yeah, this is something we should keep coming back to.

I don’t know how much discussion we’ll get right now, but it’d be a good idea to revisit once the season gets going.

"I never threw an illegal pitch. The trouble is, once in a while I toss one that ain't never been seen by this generation." - Satchel Paige

by Steve Slowinski on Jan 10, 2010 4:07 PM EST up reply actions  

It might be helpful to include WPA in your graph

Which goes to show that even “advanced” stats can be descriptive and not evaluative. That makes it out to be less of a saber vs traditional argument, and might better illustrate your & Click’s point that it’s all about understanding what the stat shows.

Also, when you say “It’s less descriptive than ERA – what exactly does a 3.00 FIP equate to? It’s not as easy to say as it is with a 3.00 ERA”

I disagree somewhat with this statement… a 3.00 FIP equates to a given rate of walks, strikeouts, and homeruns, which I think is pretty understandable. What makes it confusing is that it specifically ignores events that contributed to the outcome of games, but weren’t controllable by the pitcher. So, someone with a 3.00 ERA had a good year, there’s no question about it. Someone who had a 3.00 FIP should have had a good year, but in reality may or may not have the results to match. They are, of course, more likely to have good years in the future than a pitcher with a higher FIP.

by ChiBurbRaysFan on Jan 10, 2010 11:15 AM EST reply actions  

A few more things...

Aren’t ISO and wOBA pretty descriptive? ISO is a subtraction of two descriptive stats, so it should be equally as descriptive as BA and SLG. And correct me if I’m wrong, but I think of wOBA simply as a way to combine all the slash stats into one number, while correctly weighting each component for the value it contributes (similar to OPS but better). wOBA is perhaps the most descriptive hitting stat, because it includes all the available information on how well a hitter performed in a given year. Since BA ignores walks and power, it should actually be less descriptive, right?

Also, where would you place UZR and +/- on the scale? I don’t have an answer to that one…

by ChiBurbRaysFan on Jan 10, 2010 11:35 AM EST up reply actions  

Good call...WPA would be another good one to put on there

Leverage Index would be another, so I’ll throw those on there.

And yeah, you can definitely argue about where ISO and wOBA should go. I went back and forth between them both, but I put them where I did because I felt like they were enough removed from the on-field results as to seem abstract to people first getting into sabr. That, and their both highly valuable evaluative tools. And that’s why the graph is a continuum…both stats definitely still have descriptive characteristics. They should be located closer to the right side of that box, if nothing else.

Since measuring a stat’s “descriptive” characteristics is such a subjective thing, this chart is far from the be-all-end-all answer. Also, the set up may not be entirely perfect either…Andy H’s idea for a Venn Diagram might be a bit more suited for it.

"I never threw an illegal pitch. The trouble is, once in a while I toss one that ain't never been seen by this generation." - Satchel Paige

by Steve Slowinski on Jan 10, 2010 4:01 PM EST up reply actions  

As for the FIP/ERA comment,

my rationale behind the quote was that was this: I know that a 3.00 ERA means that a pitcher let up an average of 3 ER per nine innings, but I don’t know exactly what a 3.00 FIP is equivalent to. How many walks, hits, a strikeouts does a 3.00 FIP performance equate to? Maybe it’s just me, but I consider myself pretty well versed on sabremetrics, but I couldn’t answer that without looking it up. It’s just slightly more abstract and there’s more involved in FIP, compared with the simplicity of ERA.

"I never threw an illegal pitch. The trouble is, once in a while I toss one that ain't never been seen by this generation." - Satchel Paige

by Steve Slowinski on Jan 10, 2010 4:05 PM EST up reply actions  

A FIP of 3 could mean many combinations of K's, Walks and HR rate

2009
Garza/Cormier
FIP 4.17 vs 4.18
K/9 8.38 vs 4.19
BB/9 3.5 vs 2.91
HR/9 1.11 vs 0.70

Very different peripherals

Follow Me on Twitter @FreeZorilla

by FreeZorilla on Jan 10, 2010 4:12 PM EST up reply actions  

Fair points all around...

Like you say, the plot is more to illustrate a point than to place things exactly, so there’s not much more to be said on the placement of wOBA and ISO.

As for FIP/ERA, even when you say an average of 3 ER per nine were given up, it doesn’t say any more about how those runs were given up than FIP does. Was the pitcher HR prone? Walked a lot of guys? Had a very high BABIP? Defense was good/bad? So you still have the same uncertainty about how you got there that FreeZo pointed out.

Anyway, I think we agree that one way or another FIP is a more abstract than ERA, so it’s not really worth nitpicking exactly why or how much. It’s also possible that part of it is just familiarity, we’ve all been used to hearing about ERA since childhood, but learned about FIP much later, so it’s just not as intuitive… much like the metric system is a lot easier to use in theory, but if you didn’t grow up with it you lack the subjective references that make it meaningful (is 180cm tall, short or average?). Although FIP being on the same scale reduces that effect somewhat.

by ChiBurbRaysFan on Jan 10, 2010 11:19 PM EST up reply actions  

One thing I'll add

A lower FIP than ERA does not necessarily the pitcher should fare better next year. There can be valid reasons for a spread between ERA and FIP, typically based on the quality of the defensive unit behind him. You should consider the team’s UZR when evaluating the spread. For groundball pitchers, pay close attention to the quality of the infield. What it does give an idea of is the pitcher’s true talent regardless of defense. If you took the same pitcher who plays for a bad defensive team and put him on a good defensive team, his FIP would remain somewhat constant, where an ERA should adjust based on the defense.

This is not to say that a pitcher can’t have a lower FIP than ERA in front of a good defense. THere is still plenty of luck in baseball.

Follow Me on Twitter @FreeZorilla

by FreeZorilla on Jan 10, 2010 3:37 PM EST up reply actions  

I'm a much more traditional, and older, guy

so leaving batted ball evaluation out of a pitcher’s equation has always rankled. So I consider FIP more a predictive stat than evaluative stat – or perhaps better, evaluative in the sense of a reality check for ERA. To me, tRA is a much fuller and more evaluative stat then FIP. FIP may be superior as a predictor (?), but that could also be due to variations in pitcher’s actual performance year to year.

tRA should be amenable to the same kind of analysis you’re describing FreeZo, and perhaps even more interconnected than FIP, since FIP removes defense completely. Of course, the measurable variations could be much smaller in a tRA vs ERA analysis than an FIP vs ERA one, resulting in difficulties with conclusions.

by nyyfaninlaaland on Jan 10, 2010 4:43 PM EST up reply actions  

Great read, Steve

I think it boils down to knowing exactly what you are saying. These numbers can get thrown around in an off-hand matter, but they aren’t intended to settle an argument. I liked your take on the dichotomy of evaluative and descriptive and that there is an overlap area. With that in mind, maybe a Venn Diagram would be a better representative of this idea. I don’t really want to get into the minutiae of each and every statistic, but I think if anyone has a question, Google should be your friend. Most of us here are self-educated on this great topic. We all rose up from the muck to become slightly more enlightened, by reading, and doing our own work. That is why I never feel all that bad for new users complaining about acronyms. Terp is a great example above. He put it eloquently that you don’t have to change yourself, just open up your mind to new ideas.

I'm a writer.

by Andy Hellicksonstine on Jan 10, 2010 12:11 PM EST reply actions  

Largely agree

but I think a bit more emphasis on degree of evaluation – and the axis is there, so it’s just a matter of placement. Also wonder if there isn’t a predictive value issue as well.

And perhaps descriptive isn’t quite the right word. There are tabulative stats – the raw meat of statistical constructs, such as k’s, BB’s, etc. I don’t see HR’s as evaluative but HR rate stats would be. Then there are rate stats – BA, OBP, K/9, ERA, etc, that tell us a bit more, and to differing degrees – for example, OBP has more eveluative capability, or is at least a more complete picture, than BA. And finally evaluative constructs. The lines can get blurry – ISO is really just a rate calculation comparison, but still more evaluative than it’s components.

by nyyfaninlaaland on Jan 10, 2010 4:55 PM EST up reply actions  

And not that you intend this to be complete

but BABIP is a big one that not displayed.

Another quibble.

Understand the point of excluding HR’s from BABIP data – it’s an attempt to remove non-defensible batted balls. But does it really suit pitcher evaluation accurately in this case? And perhaps it isn’t completely appropriate for hitters either. It disproportionately benefits pitchers that surrender more HR’s (though park effects also have a big role here), and skews power hitters evaluation. FIP provides a 2nd test for pitchers. As such perhaps sidebarring BA for hitters and BA against for pitchers with the BABIP data makes some sense. I know there’s been studies of average BABIP against for pitchers who are more GB types vs more FB types, and the GB types fare worse, but the run prevention price is much higher for the excluded HR data. And ultimately, measuring run prevention is the end point.

by nyyfaninlaaland on Jan 10, 2010 5:06 PM EST up reply actions  

But all that comes back to your well made key point

that it’s all about knowing what you’re trying to measure, and how it will be used.

by nyyfaninlaaland on Jan 10, 2010 5:08 PM EST up reply actions  

This could well be a cornerstone post.

I could see this as an excellent reference for those getting familiar with advanced statistics as a whole. An excellent job Slowinski!

on Twitter @CubsStats23

by BWoodrum on Jan 10, 2010 1:44 PM EST reply actions  

Good post. I think the bulk of the difference is explainability and how easy it is to calculate. The math is the problem IMO

The traditional public likes something that is easy explainable and something that they can calculate. The average Joe can calculate BA, ERA, and OBP extremely easily. It is 5th grade math. Once you start having stats with equations with multiple variables and sensitivities the average Joe becomes lost. It is a higher level of math that the average person never took or totally forgot about. We can teach someone to the cows come home about tRA, and they even be able to remember and understand the simple definition. But if they are never able to grasp how, why, and what goes into the calculations then they are never truly going to adopt it with open arms.

Can we get robots for umpires and a computer to make in game strategy decisions? I'm sick of inconsistently bad umpiring and Joe's pitiful in game management. Oh and Navi (and BJ) need some PED's. BenZo, Bartlett, and Pena do not.

by matthan on Jan 12, 2010 4:52 PM EST reply actions  

Comments For This Post Are Closed


User Tools

Founded in 2005, DRaysBay is home to, "Progressive statistical analysis and reasoned argument."

Please read our Community Guidelines.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
Zobrist vs Pedroia vs Cano
Scaled_php_small
Rays Community Prospect #31 Runoff

Recent FanPosts

Small
Joe Maddon Town Hall meeting on the Ron and Ian show. Any ideas for questions I should ask?
Scaled_php_small
Rays Community Prospect #37
Scaled_php_small
Rays Community Prospect #35
Scaled_php_small
Rays Community Prospect #34
Scaled_php_small
Rays Community Prospect #33
Scaled_php_small
Rays Community Prospect #32
Scaled_php_small
Rays Community Prospect #31
Scaled_php_small
Rays Community Prospect #30 (Again)

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Jeff Bagwell, Fred McGriff, The Hall of Fame, and 400 Home Runs
ESPN Chat with Matt Moore
Danny Clyburn: 1974-2012
Joe Maddon Town Hall Contest
Hickey said as of now all of the starters -- Wade Davis, Jeff Niemann,...
White Sox sign Dan Johnson
Indians acquire Canzler
Justin Ruggiano to Elect Free Agency
Dougdirt over at MinorLeagueBall compiled John Sickels' rankings with WAR values from Victor Wang's research.

Thread here.
The increasingly desperate search for offense has caused some teams to...

+ New FanShot All FanShots >

DRB Fantasy Baseball

Friends of the Site

DRB Suggestion Box

Drb4_medium


Managers

Slowsky__1__small Steve Slowinski

Dad_small Jason Collette

Brad_small BWoodrum

Price_small Erik Hahmann

Analysts

Lob-city_design_small rglass44

Untitled_small EminenceFront

Small Mulva

Rutg_uakjmedjwh9ndzd4lkll_small Imperialism32

100_1952_small MrNegative1

Steak-with-crown_small CBJones

Whelk_small Whelk

Small PGP

Scaled_php_small mr. maniac

Tampa_theatre_small jcmitchell

Me_small John Gregg