clock menu more-arrow no yes mobile

Filed under:

"Fun" with small sample sizes

Watching the April Rays

Kim Klement-USA TODAY Sports

I'm really going to go out on a limb in this article and make some bold claims. You may not agree with all of them, because I'm not going to play it safe. Columbus didn't discover the New World by laying out on the beach in the Canary islands, and I'm not going to stop either just because some baseball canary squawks "too bold!" at me.

Here goes nothing:

  • Steven Souza Jr. is not going to hit a home run on 60% of his fly balls. Yes, the man is strong, yes, the man is pious, and yes, the Rays acquired him because they thought he could hit home runs. But we know that generally in baseball, strong, pious men hit home runs on around 20% of their fly balls. And hey look, Souza, on his career has a 24% HR/FB mark. Expect to see something like that, once he's hit more than five fly balls.
  • Similarly, Corey "Coors-Effect" Dickerson is not going to hit a home run on a third of the fly balls he puts into play, no matter how hard it looks like he swings. His career rate is 18%. As previously mentioned, that's what you would expect from a strong, pious man (which I assume, by the cross he wears on his neck, that he is).
  • Neither Souza nor Dickerson are going to post isolated powers over .400. See above.
  • Enny Romero isn't going to go the whole season  without allowing a baserunner.  Look. He's faced ten batters and retired all of them. But there's a reason we didn't go into the season saying touting Enny as our slam-dunk shutdown closer. Don't forget that.

Unlike what I've listed above, these are some things that definitely will continue to happen:

  • Logan Morrison will strike out 45% of the time while getting a hit off just .133 of the balls he puts in play. This is obvious, because Logan Morrison is a terrible baseball player with an annoying public persona who the Rays were stupid to acquire and who made them get rid of fan-favorite James Loney. No, wait! This is a trick. That's not going to continue either. Look, Morrison is a professional baseball player with over 2,000 major league plate appearances. He's struck out 17% of the time, which is better than the league average. His BABIP is .272, which is slightly worse than league average, but not surprising for a slow guy. Let's quit freaking out when we look at his .069 batting average to start the season. Don't look for reasons why it's real. It's 31 plate appearances.
  • Ditto Brad Miller -- except that his strikeouts and walks are right in line with his career marks. The hits will come.

Think Bayesian

April Baseball should be just about the baseball. We don't need to analyze everything. But if you need to think, use the proper framework. In plain English, Bayesian reasoning is about properly incorporating the new information that you get. For everything you see happen in a baseball game, think about how likely it is to happen if it represents that player's true talent, and also how likely it is to happen even if it doesn't. Then combine it with what you already knew about the player.

Let's think about some examples:

  • It's easier to consider something that happens than something that doesn't. For instance, Logan Morrison has not hit a home run yet this season. Problem is, both players who are good at hitting home runs and players who are not often go for 31 plate appearances without hitting one. The fact that Morrison hasn't doesn't really tell you much.
  • You know who has hit a home run this season, off the Rays? Josh Thole. He's only hit nine in his entire career. If you only watched that one game, and if you didn't know anything about Thole, and if you didn't think about the chances of a false positive, you might have gone out and picked him up in your fantasy league. But no one reading this would be dumb enough to do that, right?
  • But there's more information than just whether something leaves the park or not.

  • That's a pitch out over the plate, Thole's hip leaks out early, but he still catches it good. Nice hit. But doesn't tell you much. This next one doesn't embed, so click through and take a look.
  • That's a high fastball, either at the top or just above the strike zone. Corey Dickerson takes a violent but short and perfectly level swing and hammers it. The ball gets out in a hurry. Now compare the two swings. What are the chances that a hitter of Thole's caliber can get on top of a pitch like that and punish it like that? It's not the result that tells us much, but the actual look of the swing suggests that one batter is a guy who probably hits home runs, while the look of the other doesn't really tell us much of anything.
  • And the other side of Bayesian reasoning is incorporating the new information into what you already know. The other day Beyond the Box Score had an article comparing the Statcast data for Kevins Kiermaier and Pillar. Based on a small sample size, the article found that Kiermaier's first step was significantly slower than Pillar's. I don't know the chances for false positives from Statcast data in this sample size, but I do believe the chance is large. What I do know is that I don't think Kiermaier's first step is significantly slower than anyone's, on average. KK's first step is one of his most obvious salient strengths. That's part of what the "overboogie" is all about. Kiermaier's first step is immediate, and that makes it often wrong/inefficient. He then adjusts on the fly. So I'm willing to discard the conclusions in this article, because of the strength of my Bayesian prior.


Just watch the games. Freak out if you want, but don't freak out so self-assuredly. There will be time to overanalyze everything later.