DRaysBay: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
New Blog: The Boxing Bulletin for Boxing Fans!

Differential Baseball Analysis

So I'm probably not running across anything too groundbreaking right here, but I think I'll give it a whirl none the less: baseball is a game that lends itself to statistical analysis, and rightfully so - the large game/PA sample size leads to statistically relevant, significant, and noticable differences between players at every position. The problem with traditional baseball statistics, however, is consistency: a team may get "hot" at the right or wrong times, hit like crazy and pitch lights out, and all of the sudden seemingly collapse at the end of the season or during a long losing streak. Losing, hitting, and hitless streaks are quite statistically improbable. For example, consider a .300 hitter going on a 10 AB hitless streak - the probability of that event is 0.700^10, which is about 2.8%. This happens quite frequently in baseball, seemingly more often than the numbers would seem to suggest.

That is because of a simple principle: prior performance is the best indicator of future success. This seems to be true in both the short and long runs, and how you can you deny the logic? A pitcher who has performed well recently will probably continue to perform well, and a pitcher who has given up home runs, walks, and hits like nobody's business will probably continue to do so. At the same time, some hitters are rather "streaky" (or are considered that way, at least) and tend to hit better in one portion of the season, hit at a high clip for short periods of time, or reach hitless streaks that make Andruw Jones and Jason Varitek cry. In economics, we say "averages are better than extremes," and so if starters A and B pitch to the tune of a 3.75 ERA, the one who is more consistent should be considered the better starter. That is, if starter A gives up 1-2 runs per start but never blows up, whereas starter B may pitch lights out but also gets nerve-rattled about 10 starts a year, you have two players with contrasting values. In economic terms, in terms of cost control, and in most cases in baseball, starter A is more valuable than starter B because he is more consistent and has less variation in earned runs per inning pitched.

So, going along with this general principle, what if we considered inter-seasonal statistical progressions and regressions using differential equations? In other words, what if we try to describe or predict the team runs scored / runs allowed in a particular outing strictly based on prior performance over the past three, four, or five outings as well as their general running seasonal averages? Would the result be useful and insightful, or a wasted exercise in mathematics? If we know that the Rays will be a "Summer" team and the Sox will be a "September" team in advance, because the pitchers and hitters seem to trend that way?

I think it would be really interesting to see this worked out and played out mathematically. It would be a tough differential system to figure, especially since there are so many players and it is difficult to verify with any kind of certainty which data should be ignored and which data should be examined. In the end, however, we should use prior game performance (again, in the small but statistically significant range) to see how streaks, consistency, and prior performance affect the standings and the statistics.

I am a math and economics major, if you can't tell, and I am willing to do all of the math behind this project. I'm sure it has been done before in some incarnation and it will be an exercise for math students in the future, but if you are a data miner and interested in working on an alternative analysis and getting your name out there, let me know and I will hook it up. Also, please give me any thoughts on this approach.

0 recs  |  Comment 20 comments

Story-email Email Printer Print

Comments

Display:

Differential Analysis

elijadukes,

Baseball Prospectus has a stat that measures something similar to this: flake. It is a measure of variance (standard deviation) between a starter’s starts. This does not measure streaks, however. For example, last year, CC Sabathia was the “flakiest” pitcher, because he had the widest variance between starts. But as we know, he followed exactly the sort of “streaky” pattern you described—mediocre/poor first half, amazing second half.

I like your theory a lot. It does seem like success and failure seem to clump together in certain stretches of time. On a human/psychological level this may have to do with repeatability of actions. Scouts often talk about whether a pitcher can repeat their mechanics, and I think this often defines whether a guy can be successful in the majors. I think the Reds’ Homer Bailey is an example of a guy with a ton of potential but not a lot of consistency. Or, probably a better example, the Mets’ Oliver Perez. On a microcosm, it seems clear that when Perez is able to get into a groove and throw consistently (as opposed to wildly), he is more successful.

Could you make a stat that was a sort of “Marcel” predictor and weighed ERA from the past 3 starts (49.8% last start, 33.2% 2 starts ago, 16.6% 3 starts ago) to predict the next start? To test your hypothesis, this new stat would then need to be a better predictor than overall ERA in the next start. (Or maybe you ought to use FIP to test this, since it fluctuates less?)

But I do thing this would add value to an over analysis of a pitcher’s talent. A 3.50 ERA with a high “flakiness” is not as good as a 3.50 ERA with a low “flakiness.” This analysis would be an improvement on FLAKE—whether someone is streaky vs. whether someone is just random in success/failure.

by a-danv on Mar 19, 2009 4:04 PM EDT reply actions   0 recs

Wouldn't a high flakiness pitcher be more valuable to a low scoring team and a low flake pitcher

be more valuable to a medium to high-scoring team? Assuming that they are the same ERA (or whatever metric you prefer to use) each could be valuable to a certain team.

Do what you love to do and give it your very best. Whether it's business or baseball, or the theater, or any field. If you don't love what you're doing and you can't give it your best, get out of it. Life is too short. You'll be an old man before you know it.

-Al Lopez

by Sandy Kazmir on Mar 19, 2009 8:34 PM EDT up reply actions   0 recs

The flakiness you want from a pitcher depends how good he is, not just how good his offense is.

But yes, in general.

In fact, it turns out that you want ALL pitchers to be flaky (and offenses consistent) based on the shape fo the distribution of scoring runs.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Mar 20, 2009 10:00 AM EDT up reply actions   0 recs

I think I agree with you

I was thinking an example might be a goalie for the Lightning. Since they don’t score goals, (I think it’s a philosophical choice), they would be better off with a high flake goalie, who is capable of winning a game via shutout, even if he gets blown out occasionally. The blowouts might have been losses anyway since giving up 5 is the same as giving up 3 when you play on a team that only scores 2. If you are a low flake guy that seemingly always gives up 2-3, they may never win a game since they suck so hard in shoot-outs.

Do what you love to do and give it your very best. Whether it's business or baseball, or the theater, or any field. If you don't love what you're doing and you can't give it your best, get out of it. Life is too short. You'll be an old man before you know it.

-Al Lopez

by Sandy Kazmir on Mar 20, 2009 10:43 AM EDT up reply actions   0 recs

I question this sentiment:
In economics, we say “averages are better than extremes,” and so if starters A and B pitch to the tune of a 3.75 ERA, the one who is more consistent should be considered the better starter. That is, if starter A gives up 1-2 runs per start but never blows up, whereas starter B may pitch lights out but also gets nerve-rattled about 10 starts a year, you have two players with contrasting values. In economic terms, in terms of cost control, and in most cases in baseball, starter A is more valuable than starter B because he is more consistent and has less variation in earned runs per inning pitched.

I don’t really feel like delving into right now, maybe later, but I don’t think I necessarily agree with it.

by rglass44 on Mar 19, 2009 4:27 PM EDT reply actions   0 recs

Good post, though.

I enjoy well-thought-out, meaningful, insightful posts like this. Keep them coming. Hopefully it spurs some good debate.

by rglass44 on Mar 19, 2009 4:28 PM EDT up reply actions   0 recs

It's kinda like saying:

If two players play 10 years and have equal WARs (let’s say 50) after those years, which would you rather have given the pieces when one player gets 5 WAR each season, and the other player gets a 10 win and a 0 win season thrown in. Or at least something to that effect.

by R.J. Anderson on Mar 19, 2009 4:54 PM EDT up reply actions   0 recs

I took it as such, and I don't really have an answer.

Despite looking identical, one player is reliable, the other is up and down. Is the unpredictability worth an additional few wins per any given season? What about the seasons in which he flattens out?

I don’t know. I guess if you an assure me that both finish with 50 after 10 I might go with the higher ups, banking on those extra wins making a difference.

by R.J. Anderson on Mar 19, 2009 5:10 PM EDT up reply actions   0 recs

Right, as I noted above, you actually prefer the inconsistent starter.

Teams score 4.8 runs per game from 2000 to 2004, but you obviously can’t score exactly 4.6 runs in a game. The distribution looks something like this:

0 5%
1 9%
2 12%
3 13.5%
4 13%
5 11.5%
6 10%
7 8%
8 6%
9 4%
10 3%
11 2%

So, if a team is still going to average 4.8 runs per game on offense, that means fewer really high numbers and fewer low numbers. But since the really high numbers are further from the mean than the low numbers, you’ll be trading, say one 9 run game for a 5 run game in addition to two three run games to a five run game. Since your pitchers will hold the other team to a lot of 3 and 4 run games, bumping two three run games up to five is huge. Coming down from nine runs to five still gives you a decent chance of winning. Extreme example, I realize.

On the pitching side, you want the reverse. It’s worth taking a couple extra 3 runs games in exchange for a 9 run game.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Mar 20, 2009 10:11 AM EDT up reply actions   0 recs

Here's a visual look at the shape of the run distribution

The whole reason for the wanting/not wanting consistency is based on the fact that that graph isn’t very symmetric.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Mar 20, 2009 10:13 AM EDT up reply actions   0 recs

Exactly.

I’m glad you had that stuff handy (or bothered doing it) because I didn’t feel like getting into it. The marginal value of runs past a certain point gets lower and lower and lower. If you had a 3 ERA pitcher that gave up 12 runs one out of 4 starts (all CG games to make it easy), then he would be way more valuable to his team than a guy that gave up 3 every time. It is an extreme example, but those are the easiest to evaluate.

by rglass44 on Mar 20, 2009 11:31 AM EDT up reply actions   0 recs

But how many W's would he have?*

W-L is all that matters.

Below you will find this exercise for the 2008 Rays.

Rays
0 7
1 16
2 20
3 18
4 23
5 24
6 11
7 15
8 9
9 3
10 6
11 5
12 1
13 2
14 1
15 1
Mean=4.78, Median=4, Mode=5

Opponent
0 12
1 14
2 22
3 27
4 25
5 15
6 17
7 13
8 5
9 7
10 0
11 1
12 2
13 2
Mean=4.14, Median=4, Mode=3

  • Indicates a poor joke, laugh at your own risk.

Do what you love to do and give it your very best. Whether it's business or baseball, or the theater, or any field. If you don't love what you're doing and you can't give it your best, get out of it. Life is too short. You'll be an old man before you know it.

-Al Lopez

by Sandy Kazmir on Mar 20, 2009 11:44 AM EDT up reply actions   0 recs

This should be viewed as two separate comments

I just didn’t feel like posting twice. The first part is a joke the second is actual data from gamelogs.

Do what you love to do and give it your very best. Whether it's business or baseball, or the theater, or any field. If you don't love what you're doing and you can't give it your best, get out of it. Life is too short. You'll be an old man before you know it.

-Al Lopez

by Sandy Kazmir on Mar 20, 2009 12:35 PM EDT up reply actions   0 recs

OK

Seemingly, since the Rays were shut out 7 times there is a 155/162 chance that the team wins those games where “Frank Flaky” gives up 0 runs. Since they scored >12 4 times, there is a 4/162 chance they win games where he gives up 12. So in 32 starts (24 great and 8 terrible) the team would definitely win 23.16. On the other hand, “Cliff Consistant” would give up 3 runs every game, and the team would thus win every game where they scored >3. In the same 32 starts he would win 19.95. There is the issue of games that fall on the cusps, though. There were 8 such cusp games for Flaky, and 27 for Consistent. Divide both by 5 (162/32), and they have 1.58 and 5.33 games up for grabs. Assuming team wins both extra inning games at the same rate, they would need to win 86% of their extra-inning games to break even.

by rglass44 on Mar 20, 2009 2:14 PM EDT up reply actions   0 recs

ugh

My coffee got cold while I did that.

by rglass44 on Mar 20, 2009 2:14 PM EDT up reply actions   0 recs

Pretty interesting that this is what I pulled for Texas:

0 6
1 12
2 15
3 23
4 18
5 19
6 17
7 11
8 8
9 8
10 5
11 6
12 4
13 4
14 2
15 2
16 1
17 1

They have more double digit games than us, but what is interesting is that they don’t win all of those, whereas if we put up 11 runs we are pretty much assured a victory.

Do what you love to do and give it your very best. Whether it's business or baseball, or the theater, or any field. If you don't love what you're doing and you can't give it your best, get out of it. Life is too short. You'll be an old man before you know it.

-Al Lopez

by Sandy Kazmir on Mar 20, 2009 2:49 PM EDT up reply actions   0 recs

Padilla has to start sometime.

Space.

It's a problem we face.

So we never go anywhere.

We just stay in one place.

by hazel on Mar 22, 2009 3:33 AM EDT up reply actions   0 recs

Comments For This Post Are Closed


User Tools

Founded in 2005. DRaysBay is home to "progressive statistical analysis and reasoned argument."
Start posting about the Rays »

Join SB Nation and dive into communities focused on all your favorite teams.

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Desert Dogs to play in the AFL championship @ 2:30 EST.
Desmond Jennings Makes the Tops AA-All-Stars
ABC Coalition to Vote on Draft Report Today.
Fan page for Dick Bosman, Rays minor league pitching coordinator
Upton's Struggles vs Lefties
Evan Longoria wins the Silver Slugger Award
09 Minor Leaguers File for Free Agency
Longoria on the MLB 2k10 cover?
Thank you Tim Marchman.
Longo's Slugcon by Location

+ New FanShot All FanShots >


VPs of Baseball Operations

Nando_small R.J. Anderson

Raysring1_small Tommy Rancel

Zorilla_small FreeZorilla

Price_small Erik Hahmann

Ticket Account Executive

Rays_small Steve Slowinski