clock menu more-arrow no yes

Filed under:

What To Do With Projection Systems

New, comments
Getty Images

If you do a Google search for "Baseball Projection Systems," you get at least 3,300 results. These days, it seems that everyone has their own way of projecting what baseball players will do. Each season, someone sits down to review how the systems do; in 2010, Tom Tango did it and just this morning Matt Swartz reviewed last year's numbers. The Baseball Projection Project shows just how many systems are out there today for consumers to digest.

Earlier today, Erik posted a story about what the PECOTA projection system from Baseball Prospectus had to say about the 2012 Rays as their numbers were released yesterday. How you handle these projections are up to you, but projections are somewhat like batting averages. The best hitters in baseball fail to get a base hit 70 percent of the time while baseball projection systems can miss on 30 percent of their projections and still call it a banner year. If you have followed pre-season standings projections over the past four seasons, you know the Rays have often exceeded what the spreadsheets thought they would do, sometimes by as many as 12 games.

As Dayn Perry put it in 2010, "These systems use things like platoon splits, ballpark data, groundball percentages, line-drive rates, strikeouts, unintentional walks, aging patterns, league environment, pitch location and pitch speed data, and so on and so on. It's a science, if occasionally an inexact one.

In Erik's post earlier today, ZiPS creator Dan Szymborski posted this comment that sums up the volatility of projections quite well:

In the last 4 years, of which there has been little substantive change to the mean projections, ZiPS has had: A really good hitting, really good pitching year
A really good hitting, really mediocre pitching year
A really mediocre hitting, really good pitching year
A really mediocre hitting, really mediocre pitching year

Consider the following projections for Evan Longoria's 2012 season that have been collected from a variety of the available projections systems:


The variance in his batting average is 19 points. The variance in his on base percentage is 30 points while his slugging percentage varies by 53 points. The variance in the counting categories is somewhat less when you consider the at bat totals against those categories.

Rather than rely on any one projection system (and this coming from someone who works for a place that publishes one), I am a strong proponent for aggregate projections. The quick and dirty way is to gather as many projections as you can and average them out as I did in the above table with Longoria. The more advanced way is to do what Ross Gore, Cameron Snapp, and Timothy Highley have done.

That trio has produced an Aggregate Projection System and produced research a few years back showing how their methodology was more accurate than any one constituent's projection system. My head hurts from reading all of the math in this paper, but the data is there to show the advantages of an aggregate approach to projections rather than relying on any one system.

As Tango and Swartz have most recently pointed out, some systems do better than others, but there is strength in combining all of the information from those systems to get a clearer view of what might happen in the coming season. These are not numbers that are presented with 100 percent certainty in terms of player performance. Rather, these are numbers based on a conservative or liberal approach to what could happen that season based on each constituent's method of projection production.

Bill James might produce optimistic numbers while CAIRO can be pessimistic, but increasing the sample size of data is always a good thing. Do not go out there and look for the projections that match your own expectations; gather everything you can and aggregate them into a singular view and compare it to your own to see what you can learn from it.