clock menu more-arrow no yes

Filed under:

Clarifying Regression

New, 87 comments

If the Jason Bartlett thread taught me anything, it's that somehow, someway, people don't understand regression. Having that concept down almost certainly changes the context in which people take the post's message and well, that's my fault for not explaining it better.

Here's what I'm saying:

From this point on (and this point isn't important, it could've been May 5th, June 19th, July 30th, whatever) you would expect Bartlett to hit closer to his expectations than his current rate. This DOES NOT mean he will finish the season with a total line near his expectations*, it simply means from Point X until season's end, we would expect him to hit closer to expected.**

Some people confused this as me saying:

Since Bartlett hit .400 through his first 400 at-bats, and we have him projected at .300, he's going to go 35 for his next 250 to even things out.

That IS NOT regression.  That IS the gamblers fallacy. Take a coin and flip it 10 times, if you get seven heads, that doesn't change your expectations of having a 50% chance at a tails flip, does it? It also doesn't mean you'll get seven out of the next 10 flips landing on tails to "correct" the balance.

There are counterpoints to be made, I'm not arguing them, and I'm sure someone will take this as an attack on Bartlett, it's not, but people read what they want. We can replace Bartlett's name with any player who gets off to a slow or hot start and the same statements apply.

---

*Yes, we adjust his expectations with the increasing data amount provided this year.

**And how do we get these expectations? By using historical player data dating back three years and regressing to major league mean - or if you feel daring, by all shortstops aged so and so six feet in height. Marcels doesn't go that in depth, PECOTA does, the difference is rather minimal despite the extra bells and whistles.