## Expected K% & uBB% based upon Pitch Results & Plate Discipline

Lately we've seen quite a few posts relating plate discipline and pitch results to walks and strikeouts. Intuitively this makes sense. The scenario that occurs after a pitch is thrown should have a strong link to strikeouts and walks.

This led me down the path of starting a project using these results, both plate discipline and pitch results, to formulate an equation via multiple regression that would predict expected strikeouts and expected unintentional walks. So far on this site we've only compared and contrasted a few of these results, and in reality there are quite a few. I'm sure some haven't even been measured yet that may have a strong impact, and I'm not even totally sure if I was able to grab them all.

This is essentially just the start of the project. I'm not totally sure if the end results will be good or bad. If someone wants to play around or offer suggestions or help in any way please do. I'm sure there are independent variables I missed and quite a few that may be removed. There are tons of possible combinations and tons of tests to check to make sure the model is actually okay to use. So if you want to play around and help please do.

That being said I did find two pretty solid equations. We certainly can improve, but I don't think the results will change that much.

Here are the results. I know many of you don't need or want to get into the statistical stuff and are just interested in what this really means. Essentially the eK and euBB is basd upon certain results (13 possible) ranging from call strikes, first pitch strikes, fouls, out of zone contact, etc

 Years (qualified pitchers) Adj R-Squared MAPE MSE RMSE eK% formula 2003-2008 92.7507% 5.8571% 0.0138% 1.1736% euBB% formula 2003-2008 77.4111% 11.9994% 0.0101% 1.0045% Last K% eK% Error uBB% euBB% Error 2009 Notable Rays Players Sonnanstine 13.81% 16.03% 2.21% 5.80% 4.93% -0.88% Wheeler 17.21% 16.63% -0.58% 4.10% 3.31% -0.79% Price 23.04% 22.53% -0.51% 15.20% 10.45% -4.75% Garza 20.88% 19.27% -1.61% 9.67% 9.41% -0.26% Balfour 23.43% 22.27% -1.16% 11.43% 10.53% -0.90% Niemann 12.85% 13.96% 1.11% 9.78% 8.90% -0.87% Nelson 21.19% 22.20% 1.01% 12.58% 9.68% -2.90% Kazmir 16.89% 17.64% 0.75% 11.15% 8.65% -2.50% Shields 17.22% 17.05% -0.17% 4.70% 4.16% -0.53% 2009 Other League Notables Baker 19.80% 20.14% 0.34% 4.82% 5.30% 0.48% Beckett 22.06% 21.36% -0.70% 7.28% 6.58% -0.70% Billingsley 23.52% 23.72% 0.20% 9.41% 8.62% -0.79% Braden 14.96% 15.75% 0.79% 5.80% 4.70% -1.10% Burnett 21.91% 20.05% -1.86% 11.50% 9.67% -1.82% Cain 19.83% 20.16% 0.33% 8.96% 6.29% -2.67% Danks 21.14% 22.25% 1.11% 7.96% 8.04% 0.08% Dempster 19.69% 20.67% 0.98% 9.07% 8.75% -0.32% Feldman 12.33% 13.55% 1.22% 8.22% 8.89% 0.67% Galarraga 15.23% 17.51% 2.27% 10.07% 7.66% -2.41% Gallardo 26.14% 23.23% -2.92% 10.68% 10.43% -0.25% Greinke 25.49% 20.71% -4.79% 4.15% 6.95% 2.80% Halladay 21.35% 20.48% -0.88% 3.70% 3.42% -0.28% Hamels 20.20% 21.42% 1.22% 4.35% 5.59% 1.24% Hammel 15.78% 14.63% -1.15% 4.81% 5.70% 0.89% Haren 25.87% 24.58% -1.29% 3.26% 5.53% 2.27% Hernandez 23.51% 22.39% -1.11% 7.01% 6.70% -0.31% E Jackson 20.35% 19.87% -0.48% 7.00% 7.57% 0.57% Josh Johnson 20.65% 20.16% -0.48% 5.87% 5.84% -0.03% Ra Johnson 20.46% 21.14% 0.68% 7.42% 6.49% -0.93% Jurrjens 16.63% 17.08% 0.45% 8.87% 6.68% -2.19% Kershaw 24.48% 22.85% -1.63% 13.14% 11.24% -1.90% Cliff Lee 16.41% 15.40% -1.01% 5.79% 5.09% -0.71% Lester 27.39% 24.85% -2.54% 7.35% 8.50% 1.15% Lilly 21.46% 21.26% -0.20% 4.87% 4.76% -0.11% Lincecum 28.95% 24.38% -4.57% 5.95% 6.67% 0.72% Liriano 20.71% 20.41% -0.30% 10.35% 9.09% -1.27% Lowe 11.97% 12.83% 0.86% 6.84% 6.70% -0.14% Oswalt 18.86% 19.36% 0.50% 5.93% 5.02% -0.92% Owings 13.83% 16.08% 2.25% 10.12% 7.92% -2.20% Pavano 16.67% 16.66% -0.01% 4.69% 3.92% -0.78% Penny 15.40% 14.66% -0.74% 7.07% 7.26% 0.19% Pettitte 15.10% 16.10% 1.01% 8.97% 8.31% -0.66% Porcello 12.67% 12.91% 0.24% 8.36% 7.17% -1.19% Rodriguez 22.84% 22.02% -0.83% 7.76% 8.03% 0.27% Sabathia 17.98% 19.82% 1.84% 6.46% 6.50% 0.03% Joh Santana 23.11% 24.14% 1.03% 7.34% 4.56% -2.78% Scherzer 23.15% 24.24% 1.09% 9.07% 7.93% -1.14% Vazquez 28.51% 27.12% -1.39% 4.82% 5.56% 0.74% Verlander 29.50% 27.45% -2.05% 6.49% 6.58% 0.10% Je Weaver 20.30% 19.45% -0.85% 7.05% 5.89% -1.16% Zito 18.00% 19.09% 1.09% 8.88% 7.79% -1.09%

* There is no JP Howell data for 2009 on StatCorner which is why he isn't here

**Both models are pretty accurate, although eK% is very accurate. The euBB% also seems to be biased towards negative errors. This is something that would have to be fixed (hence why help would be great).

All in all I've accumulated have 13 independent variables across 2003-2009 and the two dependent variables for each model, K% and uBB%. I ran my regression on data for qualified pitchers (+/- a few) between 2003-2008 (using 2009 as a test or holdout period).

I believe I found the highest Adj R-squared for both models. Both equations only use 11 of the 13 independent variables. I'll link the workbooks at the end so if you want to look over the models and statistics it will be there. Also I included the numbers for a bunch of different tests so feel free to check them out (I really haven't look real hard at them yet).

Here are the two equations that I believe had the highest Adj R-Sq:

K = 0.34523  + ( (Ball) * -0.092208 )  + ( (ClStr) * 0.642177 )  + ( (SwStr) * 1.35 )  + ( (Foul) * 0.981356 )  + ( (InPly) * -0.343883 )  + ( (Oswing) * -0.015719 )  + ( (Zswing) * -0.146531 )  + ( (Swing) * -0.42555 )  + ( (Ocontact) * -0.038438 )  + ( (Contact) * -0.184088 )  + ( (Fstrike) * -0.000762 )

uBB = 0.58193  + ( (Ball) * 0.05506 )  + ( (ClStr) * -0.443504 )  + ( (SwStr) * -0.303051 )  + ( (Foul) * 0.092248 )  + ( (InPly) * -0.352155 )  + ( (Oswing) * -0.055224 )  + ( (Swing) * -0.366769 )  + ( (Ocontact) * 0.005447 )  + ( (Contact) * -0.173878 )  + ( (Zone) * -0.043222 )  + ( (Fstrike) * -0.053383 )

Like I said before I'm sure there is a better equation out there. I'm sure something simpler as well. Feel free to mess around. I've mixed and matched a bit and in fact I did find another "K" equation from the 03-08 data that actually fits the 09 data better than the equation above. The difference isn't huge though.

These are the independent variables that I used (all in percent):

Balls, Cl Strike, Sw Strike, Foul, In Play, O Swing, Z Swing, Swing, O Contact, Z Contact,  Contact, Zone, F Strike

Correlation matrix for K% (split up into two for easier viewing):

 K Ball Cl Str Sw Str Foul In Play O Swi K 1.00 -0.27 -0.02 0.85 0.41 -0.80 0.34 Ball -0.27 1.00 -0.22 -0.31 -0.54 -0.24 -0.38 Cl Str -0.02 -0.22 1.00 -0.27 -0.34 0.16 -0.09 Sw Str 0.85 -0.31 -0.27 1.00 0.31 -0.65 0.44 Foul 0.41 -0.54 -0.34 0.31 1.00 -0.30 0.27 In Play -0.80 -0.24 0.16 -0.65 -0.30 1.00 -0.14 O Swi 0.34 -0.38 -0.09 0.44 0.27 -0.14 1.00 Z Swi 0.01 -0.24 -0.67 0.22 0.51 0.05 -0.21 Swing 0.27 -0.80 -0.35 0.46 0.72 0.09 0.44 O Con -0.57 0.07 0.15 -0.66 0.01 0.41 0.09 Z Con -0.74 0.17 0.19 -0.78 -0.30 0.66 -0.02 Con -0.85 0.13 0.22 -0.94 -0.15 0.71 -0.29 Zone -0.01 -0.61 0.24 -0.05 0.34 0.28 -0.37 F Str 0.14 -0.79 0.29 0.17 0.36 0.25 0.31 Z Swi Swing O Con Z Con Con Zone F Str K 0.01 0.27 -0.57 -0.74 -0.85 -0.01 0.14 Ball -0.24 -0.80 0.07 0.17 0.13 -0.61 -0.79 Cl Str -0.67 -0.35 0.15 0.19 0.22 0.24 0.29 Sw Str 0.22 0.46 -0.66 -0.78 -0.94 -0.05 0.17 Foul 0.51 0.72 0.01 -0.30 -0.15 0.34 0.36 In Play 0.05 0.09 0.41 0.66 0.71 0.28 0.25 O Swi -0.21 0.44 0.09 -0.02 -0.29 -0.37 0.31 Z Swi 1.00 0.63 -0.24 -0.29 -0.14 0.30 0.11 Swing 0.63 1.00 -0.13 -0.26 -0.23 0.46 0.61 O Con -0.24 -0.13 1.00 0.44 0.75 -0.10 0.00 Z Con -0.29 -0.26 0.44 1.00 0.81 -0.11 -0.01 Con -0.14 -0.23 0.75 0.81 1.00 0.16 0.00 Zone 0.30 0.46 -0.10 -0.11 0.16 1.00 0.52 F Str 0.11 0.61 0.00 -0.01 0.00 0.52 1.00

What is notable:

The correlations pretty much make sense. Swinging strikes is highly correlated with K's. Anything to do with contact, especially in play, is highly negatively correlated. What is really interesting is call strikes,f-strikes, zone aren't what I originally thought. Firstly call strikes is barely negatively correlated. A bit strange, but perhaps that makes sense on some level. Or perhaps that may be a problem with the model. Thoughts? I would have thought Zone and F-Strike would have had a higher correlation. However in a way it makes sense. If you are throwing in the zone (esp on the first strike) you have a higher chance of a ball in play which eliminates the K potential.

Correlation matrix for uBB% (split up into two for easier viewing):

 uBB Ball Cl Str Sw Str Foul In Ply O Swi uBB 1.00 0.75 -0.27 0.07 -0.19 -0.56 -0.27 Ball 0.75 1.00 -0.22 -0.31 -0.54 -0.24 -0.38 Cl Str -0.27 -0.22 1.00 -0.27 -0.34 0.16 -0.09 Sw Str 0.07 -0.31 -0.27 1.00 0.31 -0.65 0.44 Foul -0.19 -0.54 -0.34 0.31 1.00 -0.30 0.27 In Ply -0.56 -0.24 0.16 -0.65 -0.30 1.00 -0.14 O Swi -0.27 -0.38 -0.09 0.44 0.27 -0.14 1.00 Z Swi -0.10 -0.24 -0.67 0.22 0.51 0.05 -0.21 Swing -0.57 -0.80 -0.35 0.46 0.72 0.09 0.44 O Con -0.17 0.07 0.15 -0.66 0.01 0.41 0.09 Z Con -0.22 0.17 0.19 -0.78 -0.30 0.66 -0.02 Con -0.25 0.13 0.22 -0.94 -0.15 0.71 -0.29 Zone -0.51 -0.61 0.24 -0.05 0.34 0.28 -0.37 F Str -0.70 -0.79 0.29 0.17 0.36 0.25 0.31 Z Swi Swing O Con Z Con Con Zone F Str uBB -0.10 -0.57 -0.17 -0.22 -0.25 -0.51 -0.70 Ball -0.24 -0.80 0.07 0.17 0.13 -0.61 -0.79 Cl Str -0.67 -0.35 0.15 0.19 0.22 0.24 0.29 Sw Str 0.22 0.46 -0.66 -0.78 -0.94 -0.05 0.17 Foul 0.51 0.72 0.01 -0.30 -0.15 0.34 0.36 In Ply 0.05 0.09 0.41 0.66 0.71 0.28 0.25 O Swi -0.21 0.44 0.09 -0.02 -0.29 -0.37 0.31 Z Swi 1.00 0.63 -0.24 -0.29 -0.14 0.30 0.11 Swing 0.63 1.00 -0.13 -0.26 -0.23 0.46 0.61 O Con -0.24 -0.13 1.00 0.44 0.75 -0.10 0.00 Z Con -0.29 -0.26 0.44 1.00 0.81 -0.11 -0.01 Con -0.14 -0.23 0.75 0.81 1.00 0.16 0.00 Zone 0.30 0.46 -0.10 -0.11 0.16 1.00 0.52 F Str 0.11 0.61 0.00 -0.01 0.00 0.52 1.00

What is notable:

Well for the most part the obvious things hold true. Balls are highly correlated. Swings and contact for the most part are negatively correlated. A key to limiting BB would be throwing first pitch strikes. That is obviously very intuitive, but the huge negative correlation bares that out.

I think I'm going to limit this post to just this. I'll answer whatever question I can in the comment section. And I do have quite a few comments on the players themselves, but I wanted to save that for comments.

All of the data, as well as the audit for the regressions, will be linked just below. Check them out. Once I hear some suggestions, thoughts, and opinions I'll know what step should be taken next if any step at all.

If you want to play around with the notable and Rays pitchers. For example changing a value for any independent variable for a specific pitcher to see what would happen to their expected rates click here:

eK for 2009

euBB for 2009

If you want to look at the regression, the audit and all the regression statistics, as well as the results of the equation to the sample as well as the holdout period click here:

eK regression with highest Adj R sq

euBB regression w highest adj R sq

If you want to run your own regressions based upon the data set (if you want to add a variable you have to find the data and add it to the sheet, deleting is simple...click here:

2003-2008 data with 2009

For anyone interested the largest problem with this project was easily collecting the data between Fangraphs and Statcorner. Once I consolidated the data running the regressions and testing on the holdout period was quite easy. Of course with the sheer quantity of combinations testing everything would be highly time consuming.

My fantasy would be to be able to create an accurate eK or euBB based upon these sorts of variables and then be able to plug them in as part of an expectedFIP.

This post was written by a member of the DRaysBay community and does not necessarily express the views or opinions of DRaysBay staff.

## Trending Discussions

forgot?

As part of the new SB Nation launch, prior users will need to choose a permanent username, along with a new password.

I already have a Vox Media account!

### Verify Vox Media account

As part of the new SB Nation launch, prior MT authors will need to choose a new username and password.

We'll email you a reset link.

Try another email?

### Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

### Join DRaysBay

You must be a member of DRaysBay to participate.

We have our own Community Guidelines at DRaysBay. You should read them.

### Join DRaysBay

You must be a member of DRaysBay to participate.

We have our own Community Guidelines at DRaysBay. You should read them.