clock menu more-arrow no yes mobile

Filed under:

A New Regression Tool

Omar Vizquel: closing in on infinite plate appearances. Mandatory Credit: Tom Szczerbowski-US PRESSWIRE
Omar Vizquel: closing in on infinite plate appearances. Mandatory Credit: Tom Szczerbowski-US PRESSWIRE

By now, I imagine most regular DRaysBay readers understand the concept of regression. Basically, it's the idea that what we see in any number of plate appearances is not a player's true talent. Probably, they are more like an average player than they appear. If a player has had infinite plate appearances, then our information is perfect, and there is no need to regress. However only the very best players are ever given that many plate appearances. For everyone else, we need to regress. How much we regress varies depending on how many plate appearances a player has accumulated, and what we know about the spread in talent between players for each individual skill.

Regression is especially useful when talking about platoon splits, which is why I previously made a manual splits calculator tool (along with the fact that the one I used to use from berselius on Another Cubs Blog no longer worked). The problem is that, while my earlier tool did all the calculations automatically, entering the data was alarmingly difficult. To fix this, I've made a new tool (.xls version) (updated as of 8/19/2012), and included a bunch of other statistics that one might wish to regress for good measure. Please feel free to make a copy and use whenever you wish. All data is from Fangraphs.

The tool consists of five spreadsheet tabs. Here is an explanation of each. For my years, I've used data from 2000-2012.

CustomReport: This tab is a custom report I made in Fangraphs. In it, I included (in this order) PA, Pitches, wOBA, Swing%, Contact%, GB%, FB%, ISO, HR/FB, and Pace.

AndvancedLeft: This tab is merely the Fangraphs advanced batting leader board of all batters vs. left handed pitchers.

AdvancedRight: This tab is the Fangraphs "advanced batting" leader board of all batters vs. right handed pitchers.

ZIPS (RoS): This tab is the ZIPS rest of season projections. I'm using them because I wanted to include 2012 data (so as to be able to use the tool on rookies), and the rest of season projections will give an overall wOBA that knows all of the data I'm using.

Calculations: Finally, the first tab. To use the tool, just fill out the two red columns (Player and Handedness). if you enter a name in the same form that Fangraphs uses, the tool will import the data and do the rest of the calculations for you.

Lastly, for those who are interested, my splits regression numbers come from The Book, and my regression numbers for the other statistics come from this article about the Pizza Cutter study on when regression becomes reliable. The league averages are from the Steve's Fangraphs Library. Here is what I'm regressing against:

  • Pitches/PA - 65 PAs of 3.15
  • Swing% - 20 PAs of .46%
  • Contact% - 40 PAs of 81%
  • K% - 65 PAs of 18.5%
  • BB% - 85 PAs of 8.5%
  • GB% - 85 PAs of 44%
  • FB% - 110 PAs of 36%
  • HR/FB - 130 PAs of 9.5%
  • ISO - 235 PAs of .145

If you think there is an error in my calculations or methodology, please let me know, and I'll update if needed. If there are other stats that you would like to be included in the regression tool, also let me know about those, and I'll try to add them in when I have the chance.