The State of Sabermetrics: Thinking Outside The Box

Maybe it's just me, but I feel like sabermetrics has come a long way over the past few years. Back when I first starting reading baseball blogs seriously -- and granted, this was as recently as 2008 -- sabermetrics still seemed like a fringe topic. You could find sabermetric stats and concepts discussed on obscure blogs (FireJoeMorgan!), but good luck finding any mainstream site that acknowledged their existence. Jokes about bloggers living in their mother's basement were in vogue, and it was a point of pride among most writers to be anti-stat.

Not only that, but think about how far the sabermetric community has come since 2008. Back then, there still seemed like there was a sharp divide between stats and scouts; it seems ridiculous in retrospect, but I distinctly remember there being a vibe that eyes don't matter -- scout don't matter -- only stats can tell you the cold, hard truth. This was in the pre-WAR days. Pre-UZR. Pre-wOBA. Well, technically UZR and wOBA were invented earlier than 2008, but they didn't become widely available until StatCorner and FanGraphs started publishing them.

Remember when this is what FanGraphs looked like? When doing fantasy baseball research, I used to have to go traipsing around the internet to find a player's BABIP; nowadays, it takes me all of 5 seconds. The accessability of statistics has exploded over the past handful of years, and with it, sabermetrics has begun to enter into the mainstream.

That's not to say that sabermetric stats will ever be accepted by mainstream fans -- that's likely asking a bit too much -- but it's amazing to me how many places you can find stats like wOBA and BABIP and WAR. If you look around the SB Nation network, they're everywhere. If you look in Sports Illustrated, they're there. ESPN the Magazine? There. Published books? There. Movies? There. In the clubhouse, being talked about among players? There.

Ever since Mike Fast got hired by the Houston Astros, I've been doing a lot of pondering about the current state of sabermetrics. In many ways, it's better than ever. The online sabermetrics community is large and flourishing, and without even looking that hard, you can find a number of insightful analysis pieces every day. There is also still plenty of research being done in the public sphere, and our knowledge about the sport continues to grow on a yearly basis.

And yet...I can't help but feel like the public sabermetric knowledge is starting to plateau. There are fewer and fewer researchers left our there that haven't been snatched up by teams, and it's seemingly increasingly likely that the treasure-trove of Hit F/x and Field F/x data will never be released to the public. I'm sure that we'll continue to see incremental increases in our knowledge; the advances made in evaluating catcher defense recently have been fantastic, and I think the Steamer projections open up new avenues of exploration for projection systems. But without new data, how much progress can we make?

I've delved into hockey's version of advanced stats recently, and their community is amazing. In the absence of data, they have innovated and begun to either collect the data themselves (using the breadth of the hockey blogger community to help) or created proxy statistics to measure the concept they want. The NHL doesn't track time of possession? No problem, we can estimate that using the ratio of shots taken by team.

So I guess my question is, is there any baseball data not currently being tracked that you wished was available? And if so, are there some outside-the-box ways we could possibly compile that data ourselves, or create a proxy stat to measure it? Because at this point, if we want new data, I think it's going to be up to us to collect it.

For me, the biggest piece of data that I think is missing is batted ball speed. That one bit of information could open up a whole range of research opportunities. Can some pitchers actually induce weaker contact? How much does weaker contact and BABIP correlate? Is being able to induce weaker contact a predictive skill? What pitches generate the weakest contact? Is a pitcher struggling with a particular pitch? The possibilities are nigh endless.

But damned if I know how to calculate that stat, or even how to vaguely estimate it. Is it possible to create some outside-the-box estimate or proxy? Or are we doomed to forever having this piece of info beyond our grasp?