Expected K% & uBB% based upon Pitch Results & Plate Discipline
Lately we've seen quite a few posts relating plate discipline and pitch results to walks and strikeouts. Intuitively this makes sense. The scenario that occurs after a pitch is thrown should have a strong link to strikeouts and walks.
This led me down the path of starting a project using these results, both plate discipline and pitch results, to formulate an equation via multiple regression that would predict expected strikeouts and expected unintentional walks. So far on this site we've only compared and contrasted a few of these results, and in reality there are quite a few. I'm sure some haven't even been measured yet that may have a strong impact, and I'm not even totally sure if I was able to grab them all.
This is essentially just the start of the project. I'm not totally sure if the end results will be good or bad. If someone wants to play around or offer suggestions or help in any way please do. I'm sure there are independent variables I missed and quite a few that may be removed. There are tons of possible combinations and tons of tests to check to make sure the model is actually okay to use. So if you want to play around and help please do.
That being said I did find two pretty solid equations. We certainly can improve, but I don't think the results will change that much.
Here are the results. I know many of you don't need or want to get into the statistical stuff and are just interested in what this really means. Essentially the eK and euBB is basd upon certain results (13 possible) ranging from call strikes, first pitch strikes, fouls, out of zone contact, etc
| Years (qualified pitchers) | Adj R-Squared | MAPE | MSE | RMSE | ||
| eK% formula | 2003-2008 | 92.7507% | 5.8571% | 0.0138% | 1.1736% | |
| euBB% formula | 2003-2008 | 77.4111% | 11.9994% | 0.0101% | 1.0045% | |
| Last | K% | eK% | Error | uBB% | euBB% | Error |
| 2009 Notable Rays Players | ||||||
| Sonnanstine | 13.81% | 16.03% | 2.21% | 5.80% | 4.93% | -0.88% |
| Wheeler | 17.21% | 16.63% | -0.58% | 4.10% | 3.31% | -0.79% |
| Price | 23.04% | 22.53% | -0.51% | 15.20% | 10.45% | -4.75% |
| Garza | 20.88% | 19.27% | -1.61% | 9.67% | 9.41% | -0.26% |
| Balfour | 23.43% | 22.27% | -1.16% | 11.43% | 10.53% | -0.90% |
| Niemann | 12.85% | 13.96% | 1.11% | 9.78% | 8.90% | -0.87% |
| Nelson | 21.19% | 22.20% | 1.01% | 12.58% | 9.68% | -2.90% |
| Kazmir | 16.89% | 17.64% | 0.75% | 11.15% | 8.65% | -2.50% |
| Shields | 17.22% | 17.05% | -0.17% | 4.70% | 4.16% | -0.53% |
| 2009 Other League Notables | ||||||
| Baker | 19.80% | 20.14% | 0.34% | 4.82% | 5.30% | 0.48% |
| Beckett | 22.06% | 21.36% | -0.70% | 7.28% | 6.58% | -0.70% |
| Billingsley | 23.52% | 23.72% | 0.20% | 9.41% | 8.62% | -0.79% |
| Braden | 14.96% | 15.75% | 0.79% | 5.80% | 4.70% | -1.10% |
| Burnett | 21.91% | 20.05% | -1.86% | 11.50% | 9.67% | -1.82% |
| Cain | 19.83% | 20.16% | 0.33% | 8.96% | 6.29% | -2.67% |
| Danks | 21.14% | 22.25% | 1.11% | 7.96% | 8.04% | 0.08% |
| Dempster | 19.69% | 20.67% | 0.98% | 9.07% | 8.75% | -0.32% |
| Feldman | 12.33% | 13.55% | 1.22% | 8.22% | 8.89% | 0.67% |
| Galarraga | 15.23% | 17.51% | 2.27% | 10.07% | 7.66% | -2.41% |
| Gallardo | 26.14% | 23.23% | -2.92% | 10.68% | 10.43% | -0.25% |
| Greinke | 25.49% | 20.71% | -4.79% | 4.15% | 6.95% | 2.80% |
| Halladay | 21.35% | 20.48% | -0.88% | 3.70% | 3.42% | -0.28% |
| Hamels | 20.20% | 21.42% | 1.22% | 4.35% | 5.59% | 1.24% |
| Hammel | 15.78% | 14.63% | -1.15% | 4.81% | 5.70% | 0.89% |
| Haren | 25.87% | 24.58% | -1.29% | 3.26% | 5.53% | 2.27% |
| Hernandez | 23.51% | 22.39% | -1.11% | 7.01% | 6.70% | -0.31% |
| E Jackson | 20.35% | 19.87% | -0.48% | 7.00% | 7.57% | 0.57% |
| Josh Johnson | 20.65% | 20.16% | -0.48% | 5.87% | 5.84% | -0.03% |
| Ra Johnson | 20.46% | 21.14% | 0.68% | 7.42% | 6.49% | -0.93% |
| Jurrjens | 16.63% | 17.08% | 0.45% | 8.87% | 6.68% | -2.19% |
| Kershaw | 24.48% | 22.85% | -1.63% | 13.14% | 11.24% | -1.90% |
| Cliff Lee | 16.41% | 15.40% | -1.01% | 5.79% | 5.09% | -0.71% |
| Lester | 27.39% | 24.85% | -2.54% | 7.35% | 8.50% | 1.15% |
| Lilly | 21.46% | 21.26% | -0.20% | 4.87% | 4.76% | -0.11% |
| Lincecum | 28.95% | 24.38% | -4.57% | 5.95% | 6.67% | 0.72% |
| Liriano | 20.71% | 20.41% | -0.30% | 10.35% | 9.09% | -1.27% |
| Lowe | 11.97% | 12.83% | 0.86% | 6.84% | 6.70% | -0.14% |
| Oswalt | 18.86% | 19.36% | 0.50% | 5.93% | 5.02% | -0.92% |
| Owings | 13.83% | 16.08% | 2.25% | 10.12% | 7.92% | -2.20% |
| Pavano | 16.67% | 16.66% | -0.01% | 4.69% | 3.92% | -0.78% |
| Penny | 15.40% | 14.66% | -0.74% | 7.07% | 7.26% | 0.19% |
| Pettitte | 15.10% | 16.10% | 1.01% | 8.97% | 8.31% | -0.66% |
| Porcello | 12.67% | 12.91% | 0.24% | 8.36% | 7.17% | -1.19% |
| Rodriguez | 22.84% | 22.02% | -0.83% | 7.76% | 8.03% | 0.27% |
| Sabathia | 17.98% | 19.82% | 1.84% | 6.46% | 6.50% | 0.03% |
| Joh Santana | 23.11% | 24.14% | 1.03% | 7.34% | 4.56% | -2.78% |
| Scherzer | 23.15% | 24.24% | 1.09% | 9.07% | 7.93% | -1.14% |
| Vazquez | 28.51% | 27.12% | -1.39% | 4.82% | 5.56% | 0.74% |
| Verlander | 29.50% | 27.45% | -2.05% | 6.49% | 6.58% | 0.10% |
| Je Weaver | 20.30% | 19.45% | -0.85% | 7.05% | 5.89% | -1.16% |
| Zito | 18.00% | 19.09% | 1.09% | 8.88% | 7.79% | -1.09% |
* There is no JP Howell data for 2009 on StatCorner which is why he isn't here
**Both models are pretty accurate, although eK% is very accurate. The euBB% also seems to be biased towards negative errors. This is something that would have to be fixed (hence why help would be great).
All in all I've accumulated have 13 independent variables across 2003-2009 and the two dependent variables for each model, K% and uBB%. I ran my regression on data for qualified pitchers (+/- a few) between 2003-2008 (using 2009 as a test or holdout period).
I believe I found the highest Adj R-squared for both models. Both equations only use 11 of the 13 independent variables. I'll link the workbooks at the end so if you want to look over the models and statistics it will be there. Also I included the numbers for a bunch of different tests so feel free to check them out (I really haven't look real hard at them yet).
Here are the two equations that I believe had the highest Adj R-Sq:
K = 0.34523 + ( (Ball) * -0.092208 ) + ( (ClStr) * 0.642177 ) + ( (SwStr) * 1.35 ) + ( (Foul) * 0.981356 ) + ( (InPly) * -0.343883 ) + ( (Oswing) * -0.015719 ) + ( (Zswing) * -0.146531 ) + ( (Swing) * -0.42555 ) + ( (Ocontact) * -0.038438 ) + ( (Contact) * -0.184088 ) + ( (Fstrike) * -0.000762 )
uBB = 0.58193 + ( (Ball) * 0.05506 ) + ( (ClStr) * -0.443504 ) + ( (SwStr) * -0.303051 ) + ( (Foul) * 0.092248 ) + ( (InPly) * -0.352155 ) + ( (Oswing) * -0.055224 ) + ( (Swing) * -0.366769 ) + ( (Ocontact) * 0.005447 ) + ( (Contact) * -0.173878 ) + ( (Zone) * -0.043222 ) + ( (Fstrike) * -0.053383 )
Like I said before I'm sure there is a better equation out there. I'm sure something simpler as well. Feel free to mess around. I've mixed and matched a bit and in fact I did find another "K" equation from the 03-08 data that actually fits the 09 data better than the equation above. The difference isn't huge though.
These are the independent variables that I used (all in percent):
Balls, Cl Strike, Sw Strike, Foul, In Play, O Swing, Z Swing, Swing, O Contact, Z Contact, Contact, Zone, F Strike
Correlation matrix for K% (split up into two for easier viewing):
| K | Ball | Cl Str | Sw Str | Foul | In Play | O Swi | |
| K | 1.00 | -0.27 | -0.02 | 0.85 | 0.41 | -0.80 | 0.34 |
| Ball | -0.27 | 1.00 | -0.22 | -0.31 | -0.54 | -0.24 | -0.38 |
| Cl Str | -0.02 | -0.22 | 1.00 | -0.27 | -0.34 | 0.16 | -0.09 |
| Sw Str | 0.85 | -0.31 | -0.27 | 1.00 | 0.31 | -0.65 | 0.44 |
| Foul | 0.41 | -0.54 | -0.34 | 0.31 | 1.00 | -0.30 | 0.27 |
| In Play | -0.80 | -0.24 | 0.16 | -0.65 | -0.30 | 1.00 | -0.14 |
| O Swi | 0.34 | -0.38 | -0.09 | 0.44 | 0.27 | -0.14 | 1.00 |
| Z Swi | 0.01 | -0.24 | -0.67 | 0.22 | 0.51 | 0.05 | -0.21 |
| Swing | 0.27 | -0.80 | -0.35 | 0.46 | 0.72 | 0.09 | 0.44 |
| O Con | -0.57 | 0.07 | 0.15 | -0.66 | 0.01 | 0.41 | 0.09 |
| Z Con | -0.74 | 0.17 | 0.19 | -0.78 | -0.30 | 0.66 | -0.02 |
| Con | -0.85 | 0.13 | 0.22 | -0.94 | -0.15 | 0.71 | -0.29 |
| Zone | -0.01 | -0.61 | 0.24 | -0.05 | 0.34 | 0.28 | -0.37 |
| F Str | 0.14 | -0.79 | 0.29 | 0.17 | 0.36 | 0.25 | 0.31 |
| Z Swi | Swing | O Con | Z Con | Con | Zone | F Str | |
| K | 0.01 | 0.27 | -0.57 | -0.74 | -0.85 | -0.01 | 0.14 |
| Ball | -0.24 | -0.80 | 0.07 | 0.17 | 0.13 | -0.61 | -0.79 |
| Cl Str | -0.67 | -0.35 | 0.15 | 0.19 | 0.22 | 0.24 | 0.29 |
| Sw Str | 0.22 | 0.46 | -0.66 | -0.78 | -0.94 | -0.05 | 0.17 |
| Foul | 0.51 | 0.72 | 0.01 | -0.30 | -0.15 | 0.34 | 0.36 |
| In Play | 0.05 | 0.09 | 0.41 | 0.66 | 0.71 | 0.28 | 0.25 |
| O Swi | -0.21 | 0.44 | 0.09 | -0.02 | -0.29 | -0.37 | 0.31 |
| Z Swi | 1.00 | 0.63 | -0.24 | -0.29 | -0.14 | 0.30 | 0.11 |
| Swing | 0.63 | 1.00 | -0.13 | -0.26 | -0.23 | 0.46 | 0.61 |
| O Con | -0.24 | -0.13 | 1.00 | 0.44 | 0.75 | -0.10 | 0.00 |
| Z Con | -0.29 | -0.26 | 0.44 | 1.00 | 0.81 | -0.11 | -0.01 |
| Con | -0.14 | -0.23 | 0.75 | 0.81 | 1.00 | 0.16 | 0.00 |
| Zone | 0.30 | 0.46 | -0.10 | -0.11 | 0.16 | 1.00 | 0.52 |
| F Str | 0.11 | 0.61 | 0.00 | -0.01 | 0.00 | 0.52 | 1.00 |
What is notable:
The correlations pretty much make sense. Swinging strikes is highly correlated with K's. Anything to do with contact, especially in play, is highly negatively correlated. What is really interesting is call strikes,f-strikes, zone aren't what I originally thought. Firstly call strikes is barely negatively correlated. A bit strange, but perhaps that makes sense on some level. Or perhaps that may be a problem with the model. Thoughts? I would have thought Zone and F-Strike would have had a higher correlation. However in a way it makes sense. If you are throwing in the zone (esp on the first strike) you have a higher chance of a ball in play which eliminates the K potential.
Correlation matrix for uBB% (split up into two for easier viewing):
| uBB | Ball | Cl Str | Sw Str | Foul | In Ply | O Swi | |
| uBB | 1.00 | 0.75 | -0.27 | 0.07 | -0.19 | -0.56 | -0.27 |
| Ball | 0.75 | 1.00 | -0.22 | -0.31 | -0.54 | -0.24 | -0.38 |
| Cl Str | -0.27 | -0.22 | 1.00 | -0.27 | -0.34 | 0.16 | -0.09 |
| Sw Str | 0.07 | -0.31 | -0.27 | 1.00 | 0.31 | -0.65 | 0.44 |
| Foul | -0.19 | -0.54 | -0.34 | 0.31 | 1.00 | -0.30 | 0.27 |
| In Ply | -0.56 | -0.24 | 0.16 | -0.65 | -0.30 | 1.00 | -0.14 |
| O Swi | -0.27 | -0.38 | -0.09 | 0.44 | 0.27 | -0.14 | 1.00 |
| Z Swi | -0.10 | -0.24 | -0.67 | 0.22 | 0.51 | 0.05 | -0.21 |
| Swing | -0.57 | -0.80 | -0.35 | 0.46 | 0.72 | 0.09 | 0.44 |
| O Con | -0.17 | 0.07 | 0.15 | -0.66 | 0.01 | 0.41 | 0.09 |
| Z Con | -0.22 | 0.17 | 0.19 | -0.78 | -0.30 | 0.66 | -0.02 |
| Con | -0.25 | 0.13 | 0.22 | -0.94 | -0.15 | 0.71 | -0.29 |
| Zone | -0.51 | -0.61 | 0.24 | -0.05 | 0.34 | 0.28 | -0.37 |
| F Str | -0.70 | -0.79 | 0.29 | 0.17 | 0.36 | 0.25 | 0.31 |
| Z Swi | Swing | O Con | Z Con | Con | Zone | F Str | |
| uBB | -0.10 | -0.57 | -0.17 | -0.22 | -0.25 | -0.51 | -0.70 |
| Ball | -0.24 | -0.80 | 0.07 | 0.17 | 0.13 | -0.61 | -0.79 |
| Cl Str | -0.67 | -0.35 | 0.15 | 0.19 | 0.22 | 0.24 | 0.29 |
| Sw Str | 0.22 | 0.46 | -0.66 | -0.78 | -0.94 | -0.05 | 0.17 |
| Foul | 0.51 | 0.72 | 0.01 | -0.30 | -0.15 | 0.34 | 0.36 |
| In Ply | 0.05 | 0.09 | 0.41 | 0.66 | 0.71 | 0.28 | 0.25 |
| O Swi | -0.21 | 0.44 | 0.09 | -0.02 | -0.29 | -0.37 | 0.31 |
| Z Swi | 1.00 | 0.63 | -0.24 | -0.29 | -0.14 | 0.30 | 0.11 |
| Swing | 0.63 | 1.00 | -0.13 | -0.26 | -0.23 | 0.46 | 0.61 |
| O Con | -0.24 | -0.13 | 1.00 | 0.44 | 0.75 | -0.10 | 0.00 |
| Z Con | -0.29 | -0.26 | 0.44 | 1.00 | 0.81 | -0.11 | -0.01 |
| Con | -0.14 | -0.23 | 0.75 | 0.81 | 1.00 | 0.16 | 0.00 |
| Zone | 0.30 | 0.46 | -0.10 | -0.11 | 0.16 | 1.00 | 0.52 |
| F Str | 0.11 | 0.61 | 0.00 | -0.01 | 0.00 | 0.52 | 1.00 |
What is notable:
Well for the most part the obvious things hold true. Balls are highly correlated. Swings and contact for the most part are negatively correlated. A key to limiting BB would be throwing first pitch strikes. That is obviously very intuitive, but the huge negative correlation bares that out.
I think I'm going to limit this post to just this. I'll answer whatever question I can in the comment section. And I do have quite a few comments on the players themselves, but I wanted to save that for comments.
All of the data, as well as the audit for the regressions, will be linked just below. Check them out. Once I hear some suggestions, thoughts, and opinions I'll know what step should be taken next if any step at all.
If you want to play around with the notable and Rays pitchers. For example changing a value for any independent variable for a specific pitcher to see what would happen to their expected rates click here:
If you want to look at the regression, the audit and all the regression statistics, as well as the results of the equation to the sample as well as the holdout period click here:
eK regression with highest Adj R sq
euBB regression w highest adj R sq
If you want to run your own regressions based upon the data set (if you want to add a variable you have to find the data and add it to the sheet, deleting is simple...click here:
For anyone interested the largest problem with this project was easily collecting the data between Fangraphs and Statcorner. Once I consolidated the data running the regressions and testing on the holdout period was quite easy. Of course with the sheer quantity of combinations testing everything would be highly time consuming.
My fantasy would be to be able to create an accurate eK or euBB based upon these sorts of variables and then be able to plug them in as part of an expectedFIP.
4 recs |
49 comments
Comments
Basically if you have Greinke on your fantasy team you should probably trade him
If you haven’t already done so. He is really outperforming what his K% and uBB% should be. I’m sure there is another reason why they are as good as what they are, but even still you should expect a significant decline in those rates.
by matthan on Jul 10, 2009 12:40 PM EDT reply actions 0 recs
Before I actually read through all of this, I should just say
PREPARE TO BE FIREBOMBED.
Kidding, of course.
by Suttree on Jul 10, 2009 12:43 PM EDT reply actions 0 recs
This is amazing so far I'm at your first formula and I'm surprised how little value a called strike carries
A swinging strike is about twice as important as a called strike and a foul ball is nearly half again as important as a called strike. Any reason you think this is the case?
SOSH AUCTION to K ALS
by Sandy Kazmir on Jul 10, 2009 1:07 PM EDT reply actions 0 recs
protecting the plate w/ 2 strikes
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 1:14 PM EDT up reply actions 0 recs
But what about called strikes, perhaps the numbers bear out that more guys strike out on a swing than looking
Although, Beej and Burrell have to lead the league in called strike 3
SOSH AUCTION to K ALS
by Sandy Kazmir on Jul 10, 2009 1:16 PM EDT up reply actions 0 recs
thats what i meant, hitters generally swing at close pitches w/ 2 strikes
the BJs are the exception
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 1:17 PM EDT up reply actions 0 recs
I had a major reply fail
Basically this is probably the best way I can explain it given what I see in the data:
Hitter A:
Swings and misses at strike 1
Hitter B:
Lets called strike 1 go by
Hitter A is more likely to strike out because he has demonstrated the ability not to make contact whereas hitter B has not shown that ability. So for strike 2, if both hitters decide to swing it is more likely Hitter B will put the ball in play.
by matthan on Jul 10, 2009 1:18 PM EDT up reply actions 0 recs
Yeah I got that
SOSH AUCTION to K ALS
by Sandy Kazmir on Jul 10, 2009 1:35 PM EDT up reply actions 0 recs
I think that answer exists between the correlations between sw str and call str vs anything to do with contact
The data shows that if you generate swinging strikes it is usually tougher to make contact against you. It means essentially the same thing (which unfortunately means a bit of overlap among the variables). However on the flip side taking a called strike doesn’t really show the contact abilities. So their chances of hitting the ball in play when they do decide to swing is quite a bit higher than the hitter that swung and missed at strike 1.
by matthan on Jul 10, 2009 1:15 PM EDT reply actions 0 recs
This was a major reply fail to Sandy Kazmirs comment above.
by matthan on Jul 10, 2009 1:16 PM EDT up reply actions 0 recs
That makes sense, I guess a strike isn't a strike isn't a strike.
SOSH AUCTION to K ALS
by Sandy Kazmir on Jul 10, 2009 1:17 PM EDT up reply actions 0 recs
This definitively proves that we should fire Jim Hickey.
I can't wait until we trade him for a reliever.
by kericr on Jul 10, 2009 1:16 PM EDT reply actions 0 recs
wow that took a long time, you're slipping
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 1:18 PM EDT up reply actions 0 recs
He was too busy servicing Hickey to be quick
by matthan on Jul 10, 2009 1:19 PM EDT up reply actions 0 recs
Yep. Had to wash up and gargle.
I can't wait until we trade him for a reliever.
by kericr on Jul 10, 2009 1:19 PM EDT up reply actions 0 recs
At least you are clean about it
Suttree just does it behind the bushes
by matthan on Jul 10, 2009 1:20 PM EDT up reply actions 0 recs
Well Hickey at least knows how to control his load enough to distribute it evenly instead of drowning one of us while being dry for the other.
I can't wait until we trade him for a reliever.
by kericr on Jul 10, 2009 1:22 PM EDT up reply actions 0 recs
Not that there is much wrong with being a drunk
by matthan on Jul 10, 2009 1:25 PM EDT up reply actions 0 recs
Eight is enough.
I can't wait until we trade him for a reliever.
by kericr on Jul 10, 2009 1:27 PM EDT up reply actions 0 recs
Does he yell at you to
establish or command the erection?
www.bucem.com - SBNation's source for all things Buccaneer
by Buc Wild on Jul 10, 2009 1:28 PM EDT up reply actions 0 recs
Anyways this definitely backs up RJs claim that Price should be walking far less people
Due to what is happening with his pitches it is unsustainable that he walks this many people. TBH I’m not sure if this data includes last night. I grabbed the stuff off FG and Statcorner this morning.
by matthan on Jul 10, 2009 1:30 PM EDT reply actions 0 recs
FG updates nightly.
I’m not sure if SC updates around the same time or earlier/later.
by R.J. Anderson on Jul 11, 2009 11:42 PM EDT up reply actions 0 recs
League Averages
K-looking/PA= 4.5%
K-Swinging/PA=13.1%
I think this is the heart of the big discrepancy. How strike 1 is accomplished should matter less. Many hitters will change their approach with two strikes, ie expand their zone, choke up, swing more for contact.
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 1:51 PM EDT reply actions 0 recs
That could be another cause practically although I think I found the more statistical answer
Essentially the stdev of sw strikes is quite a bit higher than called strikes. This leads me to believe that pitchers control swinging strikes far more than called strikes. Also this means that the called strike% for pitchers are really bunched up and just won’t correlate that much with the k%
by matthan on Jul 10, 2009 1:57 PM EDT up reply actions 0 recs
Makes Sense
Pitchers can’t control a hitters choice to swing. I do think the standard deviation of called strikes beyond strike 1 would be far greater.
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 2:21 PM EDT up reply actions 0 recs
Also there may be some redundancy in there
FOr example, when I did my bar graphs last week of pitch result, it was all from fangraphs (didn’t think of using stat corner) Things like O,Z-Swing, O,Z-contact, Zone% are already factored into balls, called strikes, and swinging strikes.
The areas I was unable to tap into are differentiating between fouls and in play (two types of contact) with in and out of the zone. But Balls are being double counted if you use Balls and O-Swing.
Not sure if this is clear or not, let me know.
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 1:56 PM EDT reply actions 0 recs
For sure
I wanted to include all the variables I could think of into one sheet as it is far easier to delete a variable than to add one. if you think there is another combination of independent variables that would give a better result then by all means give it a go. The last sheet I listed has all the info. Just delete the columns you don’t need and then run the regression.
by matthan on Jul 10, 2009 1:58 PM EDT up reply actions 0 recs
Not going to get into this due to time
Here are my suggestions:
Balls
Called Strikes
(1-Zone%)(O-Swing%)(1-OContact)= Swinging Strikes out of the Zone
Zone%Z-Swing%(1-ZContact)= Swinging Strikes in the Zone
Then it gets more confusing as you can’t differentiate between fouls and balls in play using O and Z.
I’m not sure whether its more reliable to ignore the Zone and focus on the contact result (just use in play and fouls), ignore the contact result and focus on the zone (see below), or just use both knowing there is some overweighting.
Zone Contact=Zone*ZSwing%ZContact%
OContact=(1-Zone%)OSwing%*OContact%
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 2:34 PM EDT up reply actions 0 recs
BTW
The program I used essentially picks the combination of variables with the highest adj R sq (at least it is supposed to). So basically picking from the 13 won’t give you a higher ad r squared if you use the 2003-2008 data. However if you split some of the variables off then you may be able to get a better result or add something that I missed.
I’m going to look into this suggestion and see what I come up with. Thanks for the lead.
by matthan on Jul 10, 2009 3:00 PM EDT up reply actions 0 recs
What sucks is that StatCorner and Fangraphs don't exactly match
I’m not sure why but they classify pitches differently
by matthan on Jul 10, 2009 3:39 PM EDT up reply actions 0 recs
The error is probably slight though no?
The only data you really need stat corner for is to differentiate betwen contact in play and foul balls.
For called strikes use= Zone%(1-ZSwing)
For balls use=(1-Zone%)(1-OSwing)
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 3:42 PM EDT up reply actions 0 recs
I'm looking at the swing strike in zone and swinging stri o zone
In theory they should add up to “swinging strike”
However SwStrOzine+SwStrInZone > SwStr across 99% of pitchers
Not always by much, but on a couple guys the difference is over 1%
by matthan on Jul 10, 2009 3:59 PM EDT up reply actions 0 recs
I think I know what the problem may be
by matthan on Jul 10, 2009 4:01 PM EDT up reply actions 0 recs
Either way this is my new task
I’m going to try to gather up all the possibilities once a pitcher throws a ball
All as a % of total pitches thrown
Ball%
Call Str%
OZ SwStr%
InZ SwStr%
Oz Foul%
Oz InPly%
InZ Foul%
InZ InPly%
In theory all these should all up to 100%. It covers the ball and every possible strike
With the contact outcomes I’d have to estimate because like you said there is no way to differentiate between inzone and out of zone fouls. I’d have to think of a way to do it. Perhaps weight them by the foul and inplay%s.
Either way I think once I get those 8 metrics and run the regression I think we will have our strongest relationship yet.
by matthan on Jul 10, 2009 4:06 PM EDT up reply actions 0 recs
Agree
I would figure out the % of overall contact that is fair vs foul and weight both in play and out of play contact with the %s. Its not pefect, but Im not sure a better way.
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 5:34 PM EDT up reply actions 0 recs
I just did this and the results were pretty good
I need to fine tune and I think I can make it a tad better
Regarding the InZ Foul and InZ InPlay what I did was I found that # of pitches in the zone and multiplied it by foul and in play. Not perfect for sure. But not bad
The results were pretty surprising. I do think this model is better. The adj R squared is barely less, but I think the variables are better. Its Friday night now so I have to put an end to this for now, but I think were making some serious progress.
by matthan on Jul 10, 2009 6:29 PM EDT up reply actions 0 recs
I'll see what I can do
It would be awesome to eliminate the overlap. The way it is now is okay, but it can become even stronger and essentially rock solid if we were able to eliminate the overlap and look at each variable that way.
Take F-Strike for example
F-Strike swinging: would be a tremendous variable for K%
F-Strike looking: would be a good variable for K%
F-Strike Foul: would be a good variable for K%
F-Strike In play: obviously destroys K%
I don’t think it would be possible to get that data.
However some of the data we would be able to pull apart a bit. Eliminating that overlap would really help us reach the goal of having a rock solid equation.
by matthan on Jul 10, 2009 2:54 PM EDT reply actions 0 recs
Actually perhaps I could get that data
It wouldn’t be exact, but it would be close. But then again I’d still have some overlap. Interesting nonetheless.
by matthan on Jul 10, 2009 3:25 PM EDT up reply actions 0 recs
How?
Outside of bludgeoning yourself with Pitch F/X game by game?
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 3:43 PM EDT up reply actions 0 recs
It wouldn't be exact (which is a problem of course)
Basically you can find the # of batters the pitcher threw a FP strike to: FStrike% * TBF
Then you can estimate further by multiplying that number by say OZone SwStr% (as long as it out of total pitches thrown)
That’ll give you a rough estimate of Ozone FStrike SwStr%
I’m not sure how close it’ll be since batters change their approach given the count.
by matthan on Jul 10, 2009 4:15 PM EDT up reply actions 0 recs
I wouldn't do that.
Such a relatively large % of 1st strikes (not balls in plays) are looking. For third strikes its probably at least the inverse.
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 5:36 PM EDT up reply actions 0 recs
Nearly 3x as many Ks swinging than looking
That can’t hold true on first strikes.
Follow Me on Twitter @FreeZorilla
by FreeZorilla on Jul 10, 2009 5:37 PM EDT up reply actions 0 recs
Too many variables
for me to contemplate including the strategic variables
and the subjective variables that cannot be measured,
or so it seems to me…..
i am not proficient in calculus, but it seems to be the same endless
search as the method for predicting the outcome of a random lottery
result?
if we are proving intuition, then the academic exercise would be most satisfying
to the statistician, but also beneficial, in summary form, to a manager or coach.
meanwhile, I observe that Evan Longoria has a hole in his swing, down and in but
borderline strike…:-)
"You came into my life, you came into my heart, you came into my family"
by bgfour on Jul 11, 2009 1:52 PM EDT reply actions 0 recs
Just got to this.
I love this stuff.
by R.J. Anderson on Jul 11, 2009 11:43 PM EDT reply actions 0 recs

by 


















