As an introduction, my posts will be an attempt to crowdsource some sabermetric analysis on DRaysBay. The format is that I'll post an article presenting some analysis of a topic, showing a few different approaches. Hopefully this will generate a discussion in the comments so that in future a follow-up can be posted which incorporates the ideas people bring up in the comments.
If people have anything they'd like me to look into, please also mention it in the comments. Rays-related questions are particularly encouraged!
Full disclosure: I'm not that familiar with what's been done in the past, so I might go over things which have already been discussed or fail to give credit to people who deserve it. Sorry! I'm not aiming for completeness, I'm just trying to start a discussion. Ok, with that out of the way let's dig in. - Nick
Do hitters get "hot"?
Today I want to talk about whether or not hitters can get "hot". This is a pretty old topic, but hopefully there's still something interesting to be found. To tackle this problem I'm going to look at Evan Longoria's career game-log (I got this from Baseball Reference), specifically his game-by-game OBP. I chose Longoria because I'm hoping he's one of the few hitters with a long track record (2008 – 2014) who will still be on the team on opening day.
Approach 1: Power Spectrum
My first approach is to calculate the power spectrum (Wikipedia link) for Longoria's game-by-game on-base percentage. Power spectra work by fitting waves with different frequencies to data. If a particular frequency is dominant then there will be more "power" at that frequency. For instance, in the power spectrum of surface temperature over the United States there would be a peak corresponding to a frequency of (1 day)-1. On the other hand, if everything is random then the power is roughly the same at all frequencies.
Why is it useful to calculate the power spectrum here? Well suppose Longoria's hitting consists of alternating hot and cold streaks. Then there would be more power at lower frequencies and less at the high frequencies. You can also think of this as saying that there would be more predictability at the lower frequencies, and this would provide evidence for the existence of hot streaks. Even better would be if Longoria tends to go on hot-streaks of a particular length, say 25 games; then there would be a noticeable peak at the corresponding frequency.
Here are the results. I've done the calculation three times to check how robust the results are, once for the period 2008-2012, once for 2008-2013 and once for 2008-2014. I've also filtered out a significant fraction of the spectra for clarity.
The data is quite noisy, with the spectra looking different in each case. You could argue that from 2008 to 2012 Longoria tended to go on hot/cold streaks of about 20 games and that over the last two years he's been less streaky, but I think the safest interpretation is that the power is roughly the same at all frequencies. I've compared the spectra to white noise and they resemble that as much as anything.
So this suggests that Longoria doesn't have a characteristic length for how long he stays hot or cold, and in fact his OBP is pretty random – his recent performance doesn't tell you much about his future performance in terms of being hot or cold.
Approach 2: The length of the hot streaks
If you buy the results of Approach 1, then it seem like Longoria's hitting is pretty random, but what if we just look at times when he's hitting well?
To do this, I've gone through the game-logs and for each game asked "Was his average OBP over the last 3/5/8/etc games over 400?". There are many other ways of counting hot streaks, but the results come out quite nicely this way. 400 is a low threshold, but the results are pretty similar if I use 450 or 500; using a lower criterion means that the statistics are more robust. I've then divided the number of games for which this is the case by the total number of games Longoria has played to get a percentage.
Again, I've repeated the calculations for three different time-spans to see how repeatable the results are. The longest hot-streaks I've looked at are 75 games, the numbers drop off quickly after this.
In the plots the blue lines and dots show the data and the red lines show fits with decaying exponentials (Wikipedia link). I've made the fits by eye, but I think they're pretty good.
Interpreting these plots is a bit tricky. The blue dots show that, for example, from 2008 to 2012 on roughly 39% of Longoria's games you could say: "Over his last 3 games Longoria's OBP has been above 400", while on roughly 15% of his games you could say "Over his last 50 games Longoria's OBP has been above 400". Just for reference, Longoria's career OBP is .351.
The fact that exponential curves fit the data again suggests that the length of the hot-streaks is essentially random, though you could argue that for the last 2 years Longoria has tended to go on 15 game hot streaks slightly more often. I'm surprised that decaying exponentials work for such a large range, actually, it means that the chance of a hot streak ending is basically independent of the length of the hot streak.
One last thing I like about these plots is how Longoria's declining performance is neatly captured: as the last two years are included all of the percentages go down, showing that his hot streaks have become rarer. The decay time of the exponentials has also gone down from 25 games to 22 and then 21 games (again, these are rough fits), i.e his hot streaks have gotten shorter on average, or to put it another way the chance of a hot streak ending has increased.
So that's two lines of approach for looking at hot streaks, which I haven't seen elsewhere. I should mention that I've looked into homeruns a bit, but not to the same extent as OBP. In any case, here's a summary of the results so far:
- as far as I can tell, hot streaks are random
- the length of hot streaks can be modelled as an exponential decay, just like radioactive decay, heat transfer and beer froth. This means that the probability of a hot streak ending is constant, and doesn't depend on how long the streak's been going.
- Longoria's decline over the last two years coincides with him going on fewer hot streaks, which have also been shorter on average.
As a final caveat, these approaches have been based on looking at frequencies. Other approaches, like looking at the clustering of good games, might give different results. It's also a method I don't have much experience with. If anyone has ideas about other approaches or other players who might be more or less streaky, let me know.