When we examine pitching, we look at three different levels. Everybody looks at ERA and WHIP, and then if we're a bit more advanced, FIP, xFIP, SIERA, tERA, and FRA, too. But what happens when we see a change in a pitcher's SIERA? We try to find meaning in the pitcher's component statistics: K%, BB%, GB%, FB%. That's all well and good, but why has a pitcher's K% changed? Why is Fernando Rodney walking dramatically fewer batters than he ever has in the past? To answer these questions, we look at his statistics on the per-pitch level. We examine the percentage of pitches he throws for a ball, for a called strike, for a swinging strike, for a foul, and the pitches of his that are put in play. We come up with theories about how each level affects the other. But frankly, it's guesswork. I know that if a pitcher throws fewer balls, he'll walk fewer batters, but I'll be damned if I can tell you how many fewer. Until now.
This past week, I set out to create a model of how pitch level statistics relate to outcome level statistics. Think of it this way. There are many ways to walk a batter. A pitcher can throw ball-ball-ball-ball, or ball-ball-ball-swinging strike-ball, or ball-ball-ball-foul-ball, or foul-ball-looking strike-ball-foul-ball-foul-ball. You get the point. There are a lot of ways to accomplish each outcome. Frankly, too many to count (I tried).
But there's another way to go about it. It's a lot easier to comprehend if you work backwards. For example, what are the possible outcomes of a 3-2 count? A ball results in a walk. A swinging strike results in a strikeout, as does a strike looking. A ball in play is a ball in play, and a foul resets everything back to a 3-2 count.
Calculating the probability of each outcome would be very straight forward if not for the foul ball problem, but that's really just a minor wrench in the proceedings. Let's go back to the example of a walk. Starting in a 3-2 count, here's how a pitcher can end up walking a batter. He can throw a ball. Or he can have a pitch fouled off and then throw a ball. Or he can have two pitches fouled off and then throw a ball. Or he can have three pitches fouled off and then throw a ball. You get the idea. To represent this mathematically, we can say that his probability of walking a batter in a 3-2 count is [(probability of a ball)*(probability of a foul)^0]+[(pBall)*(pFoul)^1]+[(pBall)*(pFoul)^2]+[(pBall)*(pFoul)^3], continuing all the way to infinity. This is of course fairly easy to calculate in a spreadsheet using the SERIESSUM function, although for my purposes (because I don't know how to tell excel to calculate a sum to infinity) I've just set it to include the possibility of 50 foul balls.
If the baseball run environment changes so that there is a meaningful difference between 50 foul balls in a row and infinity foul balls in a row, someone please change my model; I'll have quit watching baseball.
After we've calculated the probabilities of the possible outcomes of 3-2 count, we can move back in the count. For instance, assuming a 2-2 count, the probability of a strikeout equals (the probability of a swinging strike) + (the probability of a looking strike) + (the probability of a ball)*(the already calculated probability of a strikeout in a 3-2 count), with all of this of course multiplied by the sum of of the power series of the probability of a foul ball. As you work back in the count, it's just a matter of repeating the same calculations over and over while referencing counts further along in the pitch sequence for which you've already found all the probabilities of outcomes.
Here is my spreadsheet calculating probabilities in every count: ModelingTake1. Feel free to download and play with it for yourself. My current inputs of pitch outcome probabilities, for those who are interested, are David Price's pitch outcomes from this year: 36.1% ball, 8.7% strike swinging, 20.8% strike looking, 17.6% foul, and 16.9% balls in play. The outcomes of this model are 10.6% walk, 28.8% strikeout, and 60.7% balls in play. Comparing these projected outcomes to Price's actual outcomes of 8.5% walk, 23.8% balls in play, 67.7% balls in play, we can see that my model seems to underestimate the number of balls in play.
This is not altogether unexpected. I'm assuming that all pitch outcomes are equally likely in each count while actually, when pitchers are behind in the count they're more likely to throw strikes so as to avoid the walk, and when hitters are behind in the count they're more likely to take a defensive approach to try to put the ball in play and avoid the strikeout. That's okay, as this is only a preliminary attempt at modeling pitching. Here are the next steps.
- Input more individual pitchers, as well as league average baselines, starting pitcher baselines, and relief pitcher baselines, so as to get a better feel for how this model does and does not reflect pitching reality.
- Rework the model to take inputs of pitch results for individual pitch types. This will allow us to project how differences in pitch mix will affect overall pitcher success (like how Rodney got dramatically better by (among possible other fixes) completely abandoning his slider).
- Add other adjustments into the model so that it more closely mirrors real world pitching. I'm very much interested in hearing suggestions on this front.
Let me know what else you'd like to see included, or how you'd like to see this model of pitching tested. I know this is something that needs flushing out, but I'm not entirely certain where to go with it yet, after I break it out into individual pitch types. One obvious direction is to pay attention to types of balls in play, but that's a fairly complex area, dependent on pitch type, pitch location, and sequencing. If you can point me towards any very specific bits of research I can use, I would love to try to incorporate it into the model.