FanPost

Differential Baseball Analysis

So I'm probably not running across anything too groundbreaking right here, but I think I'll give it a whirl none the less: baseball is a game that lends itself to statistical analysis, and rightfully so - the large game/PA sample size leads to statistically relevant, significant, and noticable differences between players at every position. The problem with traditional baseball statistics, however, is consistency: a team may get "hot" at the right or wrong times, hit like crazy and pitch lights out, and all of the sudden seemingly collapse at the end of the season or during a long losing streak. Losing, hitting, and hitless streaks are quite statistically improbable. For example, consider a .300 hitter going on a 10 AB hitless streak - the probability of that event is 0.700^10, which is about 2.8%. This happens quite frequently in baseball, seemingly more often than the numbers would seem to suggest.

That is because of a simple principle: prior performance is the best indicator of future success. This seems to be true in both the short and long runs, and how you can you deny the logic? A pitcher who has performed well recently will probably continue to perform well, and a pitcher who has given up home runs, walks, and hits like nobody's business will probably continue to do so. At the same time, some hitters are rather "streaky" (or are considered that way, at least) and tend to hit better in one portion of the season, hit at a high clip for short periods of time, or reach hitless streaks that make Andruw Jones and Jason Varitek cry. In economics, we say "averages are better than extremes," and so if starters A and B pitch to the tune of a 3.75 ERA, the one who is more consistent should be considered the better starter. That is, if starter A gives up 1-2 runs per start but never blows up, whereas starter B may pitch lights out but also gets nerve-rattled about 10 starts a year, you have two players with contrasting values. In economic terms, in terms of cost control, and in most cases in baseball, starter A is more valuable than starter B because he is more consistent and has less variation in earned runs per inning pitched.

So, going along with this general principle, what if we considered inter-seasonal statistical progressions and regressions using differential equations? In other words, what if we try to describe or predict the team runs scored / runs allowed in a particular outing strictly based on prior performance over the past three, four, or five outings as well as their general running seasonal averages? Would the result be useful and insightful, or a wasted exercise in mathematics? If we know that the Rays will be a "Summer" team and the Sox will be a "September" team in advance, because the pitchers and hitters seem to trend that way?

I think it would be really interesting to see this worked out and played out mathematically. It would be a tough differential system to figure, especially since there are so many players and it is difficult to verify with any kind of certainty which data should be ignored and which data should be examined. In the end, however, we should use prior game performance (again, in the small but statistically significant range) to see how streaks, consistency, and prior performance affect the standings and the statistics.

I am a math and economics major, if you can't tell, and I am willing to do all of the math behind this project. I'm sure it has been done before in some incarnation and it will be an exercise for math students in the future, but if you are a data miner and interested in working on an alternative analysis and getting your name out there, let me know and I will hook it up. Also, please give me any thoughts on this approach.

This post was written by a member of the DRaysBay community and does not necessarily express the views or opinions of DRaysBay staff.

In This FanPost

Teams