clock menu more-arrow no yes

Filed under:

An optional primer on pitch shape data

What is pitch tracking data? How can one use it?

The future is now as MLB pushes real-time data technology Stefan Stevenson/Fort Worth Star-Telegram/Tribune News Service via Getty Images

Baseball has access to more and better data than any other sport, and much of that data is available to the general public. This has created an open discourse that has both resulted in a lot of amazing analysis but also has powered the cultural rise of sports analytics fandom that has then spilled over into all the other major sports.

The jewel in the crown of public baseball data is the pitch tracking system, which has been around in one form or another since 2007. It’s amazing and wonderful, and it can also be overwhelming and intimidating, with multiple sources hawking data that almost-but-doesn’t-quite match, and that can be expressed in different ways.

I’ve made a vizualization to help conceptualize this data in the league context, but if this is all new to you, and you want to back up and understand what the data is and how it’s used, this overview is for you.

Pitch Tracking Measurement

Beginning in 2007, the flight of almost every pitch on its way to home plate has been tracked by one of several systems. Originally it was Pitchf/x, a system made by Sportsvision that used high speed cameras to track the flight of the ball. By 2011, the Pitchf/x system was replaced throughout major league ballparks by TrackMan, which used radar to track both the flight of the ball on its way to the plate, the initial flight of the ball off the bat, the final landing spot of the ball, and the position of the fielders. Starting in 2019, the TrackMan system was replaced by Hawkeye, which brought baseball back to high-speed cameras, but, like, better ones. Every new system has brought incremental improvements to accuracy and precision, and new measurables.

Perhaps the biggest advantage of Hawkeye over Trackman is that Hawkeye is able to measure the spin on a baseball directly, rather than only inferring spin based on movement. This new quantification of spin has improved our understanding of baseball aerodynamics and has opened up exciting new frontiers of pitch design. There’s a lot of great work going on right now about the different effects on batters of movement based on Magnus forces (directly due to spin) and movement based on on other forces (like “seam-shifted wake;” think of a wiffle ball).

This is not that. For our purposes, movement is movement, and that’ll have to do.

Classification

In addition to the measurements themselves, pitch classifications have also improved in recent years. When a pitch tracking system measures the flight of a baseball, the raw data is made up of thousands of measurements of ball position at a specific point in time. That then gets transformed by a combination of #computers, #math, and #physics into a usable description of a pitch trajectory. There are different ways of expressing this trajectory, each with positives and negatives (which we’ll get into later) but for right now just think of each individual pitch as a combination of release point, speed, spin angle, spin velocity, horizontal and vertical movement due to spin and other drag forces, final location at the front of the strike zone, and pitch result.

That’s a lot of valuable data, but to be useful for in-game analysis it needs to be organized into buckets that match as closely as possible either with what the pitcher perceives himself to be throwing, or with how the batter identifies and processes the pitches he sees. This is where pitch type classifications come into play.

In the early days, the automatic algorithmic classifications that came with the Pitchf/x data — the ones you saw on Gameday or on the stadium scoreboard — were simply not very good, and their flaws made quality analysis without reclassification impossible. Not only were they inaccurate, but the algorithm changed every year, so the inaccuracy wasn’t systemic and year-to-year comparisons were impossible. To do decent analysis, one had to first do one’s own classifications.

Then two things happened. First, Dan Brooks, Harry Pavlidis, and Lucas Apostoleris got good enough at manual pitch classification to do them at scale, and then they went and did it at scale. The company this massive effort became is Pitch Info, and the public domain arm of Pitch Info is the wonderful (if now slightly defunct) Brooks Baseball. Pitch Info data has an added bonus of being calibrated to smooth differences in the tracking systems between stadiums.

At the same time, the algorithmic pitch classification used by MLB improved by leaps and bounds, to the point where it’s now really quite good, and is perfectly usable for most purposes (although it’s still a good idea to be careful with year-over-year comparisons). There’s a wonderful breakdown by Sam Sharpe of the history and current practice of MLB’s algorithmic classification that you you should stop and read.

But even now that there are two sources of good pitch classifications, it’s still worth taking the time to think about how the classification choices we make shape our perspective on how pitching works. One of my favorite articles on this is Ethan Moore’s guest piece in Baseball Prospectus, examining whether analysis would be improved by a greater number of more descriptive pitch type buckets (short answer: yes).

And there are aspects of pitch movement which are not a part of the pitch classification at all, but that do significant work in explaining why some pitches “work” and others don’t. The most significant of these is probably angle of approach, which takes movement, release point, and pitch location into account to figure out how the ball is actually moving at the moment it crosses the plate. Alex Chamberlain wrote an excellent primer on vertical angle of approach (VAA) at FanGraphs and makes those calculations available in his custom leaderboards.

My hope for the future is that the baseball fans and armchair analysts will someday be able to easily pull data bucketed by one of several pitch categorization schemes, and that the flexibility will usher in a new golden age of pitching and hitting research.

In our present reality, though, the best publicly available pitch data comes from Pitch Info, and it is available as player averages on the FanGraphs leaderboards, so we’ll be using their classifications. Those are:

  • FA — fastball, or four-seam fastball — a hard pitch with rise, and usually a small amount of armside run
  • SI — sinker, or two-seam fastball — a hard pitch with less rise and more run than the four-seam. A few sinkers have actual drop but most don’t.
  • CH — changeup — a soft pitch which mirrors the movement of the fastball or sinker but at a lower velocity, generally with some rise and some armside run. There are changeups that match fastball movement exactly, and others that run and drop more than is common for even the best sinkers.
  • FS — splitter, or split fingered fastball — a softer-than-the-fastball pitch that mirrors the fastball, much like a changeup does, but one that emphasizes neutral rise and minimizes armside run.
  • FC — cutter, or cut fastball — a hard pitch (but usually less hard than the fastball) with either less armside run than the fastball, no armside run, or even some gloveside cut. Some cutters have less rise than the fastball as well, morphing nearly into sliders.
  • SL — slider — a hard breaking pitch with gloveside movement. Some sliders have significant gloveside movement (it’s now in vogue to call these “sweepers”) while others are more up and down, approaching zero rise or even crossing into the territory of pure drop.
  • CU — curve ball — a softer breaking pitch with significant drop. Some curves also have significant gloveside movement while some do not.

If you’re comparing these classifications to pitch data from another source that uses MLBAM classifications, know that the pitch info “sinker” contains both the MLBAM “two-seam fastball and sinker,” and that what breaking balls get called cutters, sliders, curves, and slow curves can be confusing between the two and bears special attention in every individual case.

Sources and Expression

There are now several high quality sources for pitch tracking data. Everything used here comes from an export of the Pitch Info leaderboards, which is available on FanGraphs. This is the same data you can find on the Brooks Baseball player pages. Speed and movement are calculated from 55 feet, rather than from the actual release point. That movement is expressed from the catcher’s perspective, and in relation to a theoretical spinless pitch.

This is a tricky concept to wrap your head around, but it’s worth taking the time to do so, because it’s the most common way of describing pitch data. In a pristine lab environment with no breeze and a uniformly smooth ball, a pitch thrown with no spin would travel in a parabolic arc toward home plate, decelerating away from home plate because of air resistance and accelerating downward because of gravity.

Note that this “straight” pitch — which definitely does not exist and cannot be thrown consistently in the real world — would not actually look “straight” to us. It would look something like Aaron Loup’s cutter.

The pitch that in baseball (as opposed to physics) terms we call “straight,” is either an average or below average fastball (which one to call “straight” is a worthwhile debate). That average (by movement) fastball for a righty rises about eight inches (8 in.) more than the theoretical spinless pitch would (although in absolute terms it does not rise), and runs about four inches to the armside (-4 in.).

Sticking with the first name theme, this “straight” fastball (the below average kind) looks a bit like the one that Aaron Slegers throws.

One of the great things about the Brooks Baseball website is that it used to express pitch shape in Z-scores, rather than inches and miles per hour (this feature is currently broken, here’s hoping for an eventual return). This is a good way of conceptualizing just how unusual the shape of any given pitch is, and is a good way to think about batter perception. The more different a pitch is than most of the pitches batters see, the more difficult it may be for them to adjust to that pitch quickly.

Sticking with Slegers, here’s the Z-scores for his 2020 pitch shapes, compared to right-handed pitchers with more than 50 innings pitched.

Aaron Slegers 2020 pitch shape, Z-scores
Brooks Baseball

By these numbers, Slegers’s four-seam fastball was half a standard deviation slower and had half a standard deviation more armside run than the average pitch, while rising a full standard deviation less than is average.

That lack of vertical rise is actually right on the edge of being significantly different than the norm, but is on the side that is generally considered undesireable. That is, most pitchers these days want their fastballs to generate more rise than is normal, not less. I love this example, because it sits uncomfortably on one of the central pitching questions: some characteristics make a pitch harder to hit in a vacuum, while some make it harder to hit in context, and there’s both an individual pitch mix context and a league context to consider. Teasing out why a pitch does or doesn’t work can be as much art as science.

Texas Leaguers is another great resource for pitch tracking data, and is constantly rolling out improvements. As it stands, it’s my favorite place to go for pitch shape graphs.

Aaron Slegers
Texas Leaguers

As on Brooks, the data here is expressed from the catcher’s perspective, with relation to the hypothetical movement of a spinless pitch (which would be at hte [0, 0] mark). Unlike Brooks, the data is scraped from MLB Gameday and MLB Stats API, and therefore uses the MLBAM algorithmic classifications, and a slightly different starting point for the presentation of both speed and movement. That means that, while you’re looking at basically the same thing from the two sources, the numbers are slightly different and can’t be compared directly to each other.

The other thing to remember about the Texas Leaguers graphs is that the average movement of the pitch is at the center of the circle, while the size of the circle tells you how often the pitcher throws that pitch. This can be confusing at first, because pitch movement is really a range, and not an average, so it may be your nature to perceive the circle as a range of movement. I sometimes find myself thinking that pitchers who throw their fastball often have more rise, simply because their fastball circle reaches higher on the chart.

Over the past few years there’s been an amazing boon in pitching data made available directly from MLB at Baseball Savant. Like at Texas Leaguers this data uses the MLBAM algorithmic classifications. But to make things very confusing, this data is presented differently than that at Brooks Baseball or Texas Leaugers, and is actually not consistent across all areas of the site.

Take the Baseball Savant pitch movement leaderboard for 2020 fastball movement, for example, which is similar to what you’ll see if you follow the Baseball Savant Gamefeed. In this presentation, vertical movement is expressed as actual, real-world movement, which means that it includes the effect of gravity (Brooks Baseball can also display vertical movement including gravity). All fastballs have true drop, but the 2020 leader in least drop, James Karinchak, had his 95.5 mph fastball drop only 9.5 inches, while the 2020 leader in most drop, Tyler Rodgers (a submariner with true downward force), had his 82.4 mph drop 53.6 inches.

This is a more concretely descriptive number, and as such may be a valuable one. But at the same time, the variables are less independent than they were in the previous presentation style. That’s because including gravity plays up a problem with the data that we’ve to this point been ignoring. All pitches have a set of forces acting on them, from gravity, spin (Magnus force), and other drag forces (like seam-shifted wake). The slower they are, the more time those forces have to work before they reach the plate, and the more they result in pitch movement. This was also true when we were eliminating gravity, but because gravity is a relatively high magnitude force it becomes very important here.

The Baseball Savant solution is to also display pitch movement relative to the average of all pitchers with similar speed (+/- 2 mph) and release extension (+/- .5 ft.). This is a good solution some of the time, but at the extremes of velocity and extension it’s vulnerable to that average becoming a volatile small sample size baseline.

For instance, Pete Fairbanks’s fastball vertical movement is rated as just about average by this method, while Glasnow’s is slightly above average, and no these pitchers aren’t all about fastball spin. But their velocity and extension also put them in a group where they’re being compared to backspin monsters like Aroldis Chapman and Josh Staumont, without a ton of “normal” pitchers to water the sample down.

There are times, or course, when you don’t want to work with aggregate numbers and instead need per-pitch data. Baseball Savant makes those available as well, with their truly fantastic search tool that allows you to pull specific situational pitches from the full database and then download those pitches as a .csv. This is great. What you will notice, though, is that the vertical movement measurements are presented and feet (this is not a big deal) and don’t match the measurements you find from any other source (this is kind of a big deal). What they do match is the Induced Vertical Movement you might see if you’re looking at direct Trackman readouts.

Why are these numbers so different? For one, they’re measuring along the entire flight of the ball, not just from 50 or 55 feet. But I’m not convinced that’s sufficient to explain the magnitude of the difference. My suspicion is that it has something to do with frame of reference and the computation of the initial vector, but overall this is something that the internet has been gaslighting me about for years. I’m just warning you so you don’t feel crazy when you run into it for yourself. The point is that you have to keep all your direct comparisons in-source. No mixing.

There’s one final note about why you can’t directly compare across sources, which is that they use different computational methods, as detailed here by Dr. Alan Nathan. At least as of 2018, Pitch info used the method Nathan recommends, while MLB (and by extension Texas Leaguers) does not. This as well as an appeal to “classification differences” is sufficient to explain most discrepancies between the sources, so you should feel free to wave your hand, say “#physics,” and move on.

Okay, if you’ve made it this far, now jump over to the pitch shape tool.