clock menu more-arrow no yes

Filed under:

One month in, the only truthful stat is K%

New, 23 comments

Some musings on the Rays offense and early-season stat stabilization

Cliff McBride/Getty Images

There's only one top-level statistic you can believe about the Rays offense right now, and that's strikeouts. Let me explain.

One of the top-5 most influential works in the early sabermetric period was a study by a man named Pizza Cutter, who sought to determine at what sample sizes baseball statistics become reliable. He originally did this with a split-half correlation. Here's the gist of how to do that:

  1. Take the play-by-play data for a batter (actually, do it for every batter).
  2. Separate it into two buckets by numbering every plate appearance and putting the odd-numbered ones in bucket A and the even numbered ones in bucket B.
  3. Check the correlation between the two buckets at a bunch of different sample sizes.
  4. Report the lowest sample size at which the r-squared of the correlation reaches at least 0.5.
That then tells you that at the sample size you've identified, you can be reasonably confident that the statistic is in fact telling you something about a player's true talent level over those plate appearances. Half the sample predicts the other half of the sample. This is called "internal consistency."

Now there are a few things we all need to keep in mind while we look at these numbers:
  • As with pretty much everything in statistics, the 0.5 r-squared threshold is arbitrary. You could choose any level and run with it. Pizza Cutter chose this one because it's more useful than most.
  • When we call a sample size "reliable," what we mean is that we can be reasonably certain that it's an accurate representation of a player's true talent over the span of time we're examining. It is not necessarily a prediction of future true talent. Current true talent is one of the better ways of predicting future true talent, but it's not the only way, and in some cases it may not be the best way.
  • One of the most useful ways to apply this understanding of sample sizes is to use it to identify when something about a player's approach has changed. If you know from past results that a player usually strikes out around 10%, but he begins the season striking out 20%, this stabilization rate tells you when you should start to seriously worry that something is wrong.
  • The purpose of all this is to avoid putting undue belief in things which are really just perfectly natural random variation. Sometimes guys go on a hot streak or a cold streak. A cold streak might be because the player is hitting the ball really hard but right at people and it's totally obvious his luck will improve, or it might be because he's totally lost at the plate for some unknown reason, and obviously struggling. Both of those can go in cycles, and the guy who is lost can become found in a hurry. Scouting is better than stats in a small sample size, but that doesn't mean that what you see with your eyes isn't affected by natural variation. Treat everything with caution.
Pizza Cutter went to work for a real baseball team for a while, where his new bosses convinced him that his name was dumb. When he reemerged in the public domain, he changed his name to Russell Carleton, and now he writes for Baseball Prospectus. One of the first things he did was rework the old iconic study with a more complex but essentially similar methodology. That's what I'll link to and run with.

Strikeouts

Okay, so if you clicked over and read the article, you'll see that there is basically only one statistic we should have confidence in right now: strikeout percentage. It stabilizes for hitters at 60 plate appearances. The next to go will be walk percentage, at 120 plate appearances.

The Rays players who play every day will be there in about a week, but they aren't there yet, and isolated power and home run rate will trail into relevancy a little bit under a month from now.

All of the other major batting statistics we hold so dear are more or less useless at this point, so don't get hung up on them, and definitely don't believe in current performances over expected performance.

Evan Longoria - 106 PAs - 20.2% career, 17.9% now

There's a perception that Evan Longoria isn't hitting well. It's seemed like his power has declined, and he's fouling off pitches he used to drive. His batting eye has seemed off to me, and his contact ability has seemed poor. Just a week ago I flipped over to his FanGraphs page to start writing a "Longo Decline" piece, and was surprised to see that there's not much evidence of a decline in the top-level stats. In the one statistic that we should be reasonably comfortable using right now, Longo is performing better than he has in every single season except 2011.

Now, there are more nuanced ways to look for decline, but I'm going to hold the phone on them.

Asdrubal Cabrera - 104 PAs - 17.2% career, 24.0% now

This one is a case where our eyes and our memories are not deceiving us. Cabrera has not looked good to start the season. He's struck out at a higher rate than he ever has in any full season of play. More investigation is warranted into how and why he's striking out, but it's not a good sign for the Rays free-agent prize.

Steven Souza Jr. - 104 PAs - no career rate worth mentioning, 38.5% now

This is the doozy. Steven Souza is supposed to become a fixture in the heart of the Rays order for the next six (five) years. Overall he's been good, due to some very real power, a willingness to take a walk, and a high BABIP, but the part of his batting line that's already stabilized shows a hitter who's being exploited by major league pitchers right now.

This is where the whole current true talent vs. prediction becomes really important. I totally believe that Souza has been a true-talent 38% strikeout guy to date. He's earned that unfortunate number by not protecting a large enough zone and by getting fooled badly at times when he does. But he's also a rookie with an odd career path who's barely spent over a season's worth of plate appearances in the upper minors (323 PAs in double-A and 407 PAs in triple-A). He's pretty new to facing quality pitching. If he can continue to learn on the job, he'll become a monster. If not, there's always room for another Juan Francisco in Japan.

Logan Forsythe - 96 PAs - 19.0% career, 12.5% now

So, this is the one to get excited about. Forsythe is starting to live up to the potential Andrew Friedman and the Rays front office saw when they traded for him. Brad Boxberger has already justified trading Alex Torres, but if Forsythe can continue his fine work it will offset the success of Jesse Hahn (now pitching in Oakland).

Yes he's benefiting from a .338 BABIP that is out of line with his career rate, but the rest of the package is believable. His plate discipline numbers have been better (swinging at fewer pitches outside of the zone -- and missing them less often -- while swinging at the same rate of pitches inside the zone), and his walks are correspondingly up as well, but the part of his stat line that we can most believe in, the strikeouts, is way down from his career rate.

That combination of the sample size and the magnitude of the change make a strong case that Logan Forsythe has become a better player.

Rene Rivera - 81 PAs - 25.1% career - 29.6% now

Rene Rivera doesn't have a ton of major league experience, so the Bayesian prior isn't strong with this one. I've actually thought his contact ability looked better than this, but the numbers don't lie. Rivera appears right now to be a worse hitter than the one the Rays thought they were trading for.

As is almost always said by someone when Rivera is discussed in the comments, though, his work behind the plate has been sterling.

Kevin Kiermaier - 79 PAs - 19.2% career, 17.7% now

Rays fans everywhere hope that Kiermaier truly is a late bloomer. Like Rivera, his glove would keep him in the lineup (at least against righties) even if his bat faltered, but as of yet his bat has not faltered. It's too early to believe in the power (which has been very good), and it's also too early to believe in the walk rate (which hasn't been very good), but the part of the stat line that we can most comfortably believe in should make us cautiously optimistic, especially since he's achieved the lower strikeout rate while facing a lefty 30% of the time as opposed to the 22% from last year.

Desmond Jennings - 72 PAs - 19.9% career, 12.5% now

It's a real shame that Desmond Jennings hasn't been able to stay on the field. His line so far this season looks putrid, but hidden within all the noise was a very good sign similar to Forsythe's. He had struck out far less than at any other point in his career. Now if he could just come back soon and continue on that momentum . . .

Yes, I know that's too much to ask.

Brandon Guyer - 67 PAs - 18.5% career, 20.9% now

Guyer is striking out just a bit more than he has in the past, but a two percent change isn't really telling us much over this still-small sample size. Guyer remains who we thought he was: a perfectly valuable guy to have around, but probably not someone you want playing every day.

Tim Beckham - 64 PAs - no career rate worth mentioning, 37.5% now

Right now, Beckham is Souza-lite. Yes, he hasn't reached the magic 70 plate appearance threshold, but that threshold isn't actually magic, and the magnitude of the strikeout rate is very notable. He's shown good power, but he's also been pretty overmatched, and will need to adjust to survive.

David DeJesus - 64 PAs - 14.1% career, 15.6% now

The power is slightly lower for DDJ than it's been in the past, and his walks are down compared to recent years, but those are still pretty noisy statistics in his small sample size. Judging by strikeout rate, DeJesus at age 35 is still the same hitter he was for 273 plate appearances at age 34, and it's a damn good thing the Rays didn't trade him during the offseason.

Conclusion

That's it for what you should believe in as far as Rays batting statistics go. If you see someone cite anything else you should either (a) tell them they're dumb (b) shrug (c) investigate more deeply, and try to examine the hitter on a very granular level, probably using a well-thought-out systematic scouting approach or PITCHf/x data. I recommend the third option, but it's important to remember that even when you go granular you can still make small sample size mistakes. Baseball analysis is hard, so even when you apply yourself fully to option number three, the final result is still often a shrug.

Up next (this week) is pitchers, where  K%, BB%, GB%, and FB% have all stabilized for starters, followed by hitter BB% in a few weeks.