clock menu more-arrow no yes mobile

Filed under:

Understanding Small Sample Sizes, or Evaluating Talent in April

If you are like me, you probably have got a million and one questions about the Rays running through your head these days. Will Navarro, Burrell, or Upton rebound? Will Bartlett regress? Will Longoria continue his year-over-year improvement? Will Garza take his game to the next level? Will Zorilla keep mashing home runs? The season has finally begun and it's so exciting to see this team in action. The offense has been encouraging, the starting pitching staff has been downright nasty, and there have been plenty of thrilling games already. Opening Day was like Christmas, but since then we've still been getting a present every day. It's awesome.

And so, with all this pent-up enthusiasm, I find myself perusing FanGraphs, trying to find something - anything - to write about. I want to be able to answer some of those questions, in particular the question of if Burrell has anything left to offer us. The problem is, it's way too early. Way, way too early. We talk about small sample sizes frequently here on DRaysBay, so here are the points at which certain statistics become reliable:

Offense Statistics:

  • 50 PA: Swing%
  • 100 PA: Contact Rate
  • 150 PA: Strikeout Rate, Line Drive Rate, Pitches/PA
  • 200 PA: Walk Rate, Ground Ball Rate, GB/FB
  • 250 PA: Fly Ball Rate
  • 300 PA: Home Run Rate, HR/FB
  • 500 PA: OBP, SLG, OPS, 1B Rate, Popup Rate
  • 550 PA: ISO

Pitching Statistics:

  • 150 BF - K/PA, grounder rate, line drive rate
  • 200 BF - flyball rate, GB/FB
  • 500 BF - K/BB, pop up rate
  • 550 BF - BB/PA

Don't thank me; I'm just the messenger. These numbers were derived by the saberist Pizza Cutter and although his blog is now defunct, you can find them on the Saber Library website anytime you need them. Or if you'd prefer, read his original article here.

What do these numbers mean, though? It's well and good to throw them out there, but how does this help our analysis at all? Well, I'm so glad you asked!

In short, these numbers represent the minimum number of plate appearances or batters faced that a batter or pitcher needs before that certain statistic can be deemed indicative of their true talent level. Notice that statistics like BA and ERA don't stabilize over the course of a full season; this is a great example of why one shouldn't use those statistics to discuss a player's ability level. Also, this is a great reminder on why spring training statistics are meaningless. Not only are pitchers and batters both working on adjustments, but you're dealing with samples as small as 25 PA or 50 BF. It's so small, nothing stabilizes that quickly.

Now that games actually matter, the statistics mean slightly more than spring training stats. Of course we shouldn't be attempting to draw large, sweeping conclusions from the statistics right now, but we can begin to glean something from the numbers. We can start by looking at all the stats that stabilize quickly, even if a player hasn't quite hit the threshold yet. For hitters, I'm paying attention to their Swing%, Contact Rate, Pitches/PA, LD%, and Strikeout Rate. These numbers will give us a good idea of if a player has changed their approach at the plate and if they're making solid contact. For pitchers, I'm concerned most with K/9, GB%, LD%, and Swinging Strike Rate. I am adding Swinging Strike Rate for two reasons: one, I don't believe Pizza Cutter tested it initially, and two, swinging strikes are intimately related to strikeouts. Increase one and you should increase the other.

Also, scouting data is by far the best and more reliable information we can have with samples this small, but sadly none of us here are professional scouts. Observations will do in a pinch, though, so I hope people continue to share their impressions on batters and pitchers over these next couple of weeks. I'm not talking about stuff like, "Navarro sux," but stuff like, "Although he's been showing a more discerning eye at the plate this season, Navarro's hits are still weak. He appears lost at the plate at times and has had many bloop hits. I'm not impressed." (Note: this is mostly fictional, although it is true that I haven't been impressed with Navarro yet). Share your observations, but make them robust. Did their swing look good? Did that pitch have lots of movement, even if it was hit hard? How was that pitch sequence? Keep asking these sort of questions.

Although these methods aren't necessarily mainstream or sexy, at this point of the season they are the best way to properly evaluate talent. We'll be able to use more and more statistics as the season progresses and sample sizes grow, but for now these are our tools. For reference, batters that have been playing every day have already accumulated 30-40 PA, meaning that they're on the verge of the first cutoff, and pitchers that have thrown two starts are around 50 BF, meaning they're not. Or in other words, this dance has just barely begun.