DRaysBay: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
New Blog: The Boxing Bulletin for Boxing Fans!

Updated Expected Strikeouts based on Pitch Result

 

A week or so ago I began a search to find a way to predict and/or determine expected strikeout rates. Jump to the bottom to see the new expected strikeout rates for all 2009 pitchers with 30 or more innings and continue reading on if the process and the statistics interests you.

 

I initially gathered data from 2003-2009 for all the plate discipline and pitch result categories on fangraphs and statcorner. I ran a regression against K%, and found the significant variables. The adjusted r-squared was very high, and it passed the general validity questions. I could essentially look at the results of a player’s pitches and tell you what his strike out rate would be. Very powerful stuff.

 

But here comes the obligatory but.

Star-divide

As FreeZorilla pointed out there were definitely some problems. Firstly there was some overlap, namely multicollinearity. This basically means that the independent variables are correlated to each other. It doesn’t ruin the model, but we can do better. Secondly the model had limited application use. It wasn’t essentially using apples to apples, but rather apples to oranges. For example one variable was saying a % of swinging strikes out of total pitches whereas another was the % of contact a hitter made when the pitch was thrown in the zone. I wanted to standardize it and make it easier to understand and apply without taking away predictive power.

 

To solve these problems I decided to only use the results that could happen once a pitcher throws a baseball.

 

A ball, a swinging strike, in play, foul, and called strike. For swinging strike, in play, and foul we can also break it down between in the zone and out of the zone. This makes up 100% of all possibilities when a pitcher throws a pitch. In my analysis I found it was better to just use “In Play” or “Foul” instead of In and Out of zone. For swinging strikes I found it better to differentiate between in the zone and out of the zone swinging strikes.

 

Secondly I made all these metrics as part of total pitches thrown. So now we can say if a pitcher throws 2% more called strikes and 2% less balls his strikeout rate should now be about 1.8% higher.

 

Also I made the “constant” in the regression formula zero. It is not logical to have a constant that is not equal to zero. If he throws no strikes, or no pitches at all then the pitcher will have a zero strike out rate. The constant must be zero.

 

Here is the formula. I rounded it to make it a bit easier:

 

K%=(ClStr%*.9)+(Foul%*.5)+(InPly%*-.9)+(InZSwStr%*1.1)+(OZSwStr%*1.5)

 

The Adjusted R-Squared is: 91.4%

 

This essentially means how strong the model is; how much the model explains K%.  This is a very strong relationship as many consider just hitting 70% to be good. This figure takes into account how many variables are in the model.

 

All of the independent variables in the formula are significant at a 99% level of confidence. This is why I threw “Ball%” out. The model also passes the F-Test. Multicollinearity is now not really an issue. And most importantly it passes the simple logic test.

 

Again any and all help is appreciated. I’m definitely not done looking at this. For example I’ve begun to look at “In zone Contact%” and “Out of zone Contact%” in play of In play and Foul. Perhaps that will lead to a better model since the pitcher has a stronger control over those two metrics compared to in play and foul.

 

Here are the 2009 expected strikeouts. This is for all pitchers with about 30+ innings. Remember the model was based upon all qualified pitchers, so being able to predict this accurate with pitchers with this small amount of innings is very telling, basically saying it is holding true for even smaller sample sizes. Due to some constraints I'll just post the Rays, Yanks, and Red Sox here. If you want to see all pitchers with about 30+ IP go to this link. The eK's as well as all the components for all the pitchers are in the link.

I highly recommend you check out the link. Some of the best info is in there

Expected Strikeouts for 2009 pitchers with 30+ inning as well as component data

Name  K% eK% Difference
Andy Sonnanstine 13.81% 15.45% 1.64%
David Price 23.04% 21.46% -1.58%
Grant Balfour 23.46% 20.75% -2.71%
J.P. Howell 28.41% 29.85% 1.44%
James Shields 16.79% 15.27% -1.52%
Jeff Niemann 13.49% 14.18% 0.69%
Joe Nelson 20.78% 21.48% 0.70%
Lance Cormier 13.26% 14.32% 1.06%
Matt Garza 21.17% 18.89% -2.28%
Scott Kazmir 16.89% 17.88% 0.99%
Brad Penny 14.83% 14.70% -0.13%
Daisuke Matsuzaka 19.21% 18.56% -0.65%
Hideki Okajima 24.36% 23.65% -0.71%
Jon Lester 27.41% 24.60% -2.81%
Jonathan Papelbon 24.12% 22.76% -1.36%
Josh Beckett 22.07% 21.08% -0.99%
Justin Masterson 17.95% 19.02% 1.07%
Manny Delcarmen 15.79% 19.17% 3.38%
Ramon Ramirez 17.53% 22.33% 4.80%
Tim Wakefield 12.81% 12.30% -0.51%
A.J. Burnett 21.91% 19.63% -2.28%
Alfredo Aceves 21.80% 19.04% -2.76%
Andy Pettitte 14.67% 16.15% 1.48%
CC Sabathia 18.13% 18.23% 0.10%
Chien-Ming Wang 12.72% 14.35% 1.63%
Joba Chamberlain 19.40% 19.01% -0.39%
Mariano Rivera 30.07% 24.75% -5.32%
Phil Coke 20.39% 19.90% -0.49%
Phil Hughes 18.71% 19.92% 1.21%

9 recs  |  Comment 52 comments

Story-email Email Printer Print

Comments

Display:

Bumped to the FP.

That R-Squared is quite impressive.

by R.J. Anderson on Jul 21, 2009 11:51 AM EDT reply actions   0 recs

Rec'd

Wow Rivera’s K% is 30%+
Can you explain again why Ball% was thrown out? It would seem this would be important.

Follow Me on Twitter @FreeZorilla

by FreeZorilla on Jul 21, 2009 11:57 AM EDT reply actions   0 recs

A few reasons

The coeefficient was extremely small. The T-test was then really low. If I used smaller samples (say one or two years) the balls sometimes became more of a factor, but still largely irrelevant. The largest coefficient I saw was only like 2%. Either way it comes down to the t-test. Basically based upon the sample size and the coefficient the t-test determines how confident we are that it is not zero. I was using 99% confidence and the balls didn’t pass the test. However if left in the coefficient would have been like .005; which compared to the others is largely irrelevant

The reason why I think this is the case is because the other 5 metrics add up to all the possible strikes. So indirectly the balls are being factored in.

by matthan on Jul 21, 2009 12:04 PM EDT up reply actions   0 recs

thanks, makes sense

Follow Me on Twitter @FreeZorilla

by FreeZorilla on Jul 21, 2009 12:18 PM EDT up reply actions   0 recs

Yeah it is pretty strong

I also checked every other regression test. The F-Test was good, and all the variables have huge t-values. There may be some minor issues, mainly in the swinging strike area, but this is a good representation.

by matthan on Jul 21, 2009 11:57 AM EDT reply actions   0 recs

Do any/all of these components stabilize faster than K%? If so then this is really gangbusters.

by Tommy Bennett on Jul 21, 2009 12:05 PM EDT reply actions   0 recs

I haven't really looked at this explicitly

What I did notice is that sample size doesn’t really matter with this formula. I built it by using qualified pitchers, but then applied it to all 2009 pitchers with 30 IP. The errors barely increased.

This leads me to believe that I can perhaps come up with an even more accurate formula if I used 2003-2008 data for say 30+ IP instead of qualified pitchers, but for now this will do.

by matthan on Jul 21, 2009 12:16 PM EDT up reply actions   0 recs

contact rate stabilizes faster

so the miss rate should do the same.

THIS STORY ONLY ENDS ONE WAY

by colintj on Jul 21, 2009 3:39 PM EDT up reply actions   0 recs

Since today is Jeffs turn I'd like to kind of focus on swinging strikes and Mr. Niemann

This is his current components

ClStr-16.4%
Foul-19.3%
InPly-19.7%
InZSwStr-1.8%
OZ SwStr-3.68%

Basically he gets close to no swinging strikes. If you take a look at the link he is definitely near the bottom in swinging strikes for the league. If you take a look at the formula swinging strikes are extremely important. If he just boosts his swinging strikes by 1% he could increase his K rate by quite a bit.

For example if he can turn 1.5% of his balls into out of zone swinging strikes his expected K rate would increase by 2.25%. He would then be expected to strike out nearly 16.5% of batters, which is much more respectable

Then the question is how to get that type of swinging strike? Well a good first step would be pitch selection. As RJ has hammered home awhile: Curveball. I’m just guessing but more curveballs could very well do the trick.

by matthan on Jul 21, 2009 12:13 PM EDT reply actions   0 recs

Great point.

Do you have the league averages handy? Maybe how each Rays starter compares to league average?

by rglass44 on Jul 21, 2009 12:15 PM EDT up reply actions   0 recs

On the link I have all pitchers with 30+ IP for 2009

We could average them out to determine the “average” pitcher. The problem though is it wouldn’t be adjusted based upon innings pitched.

I just did this on the link and this is what I got for the average pitcher in 2009 (30+ IP):

ClStr: 17.7%
Foul-17.4%
InPly-18.9%
InZ SwStr-2.7%
OzSwStr-4.9%

Avg K%-18.1%
Avg eK%-17.9%

by matthan on Jul 21, 2009 12:18 PM EDT up reply actions   0 recs

A few notes:

Very interesting that the variable with the most weight is OOZ swinging strike %. Just goes to show that getting guys to chase bad pitches is key. This is why Price/Kaz sliders areso key for their success.

I’m suprised how heavily called strike % effects it. I guess because the difference in this variable between pitchers should be relatively similar.

by rglass44 on Jul 21, 2009 12:13 PM EDT reply actions   0 recs

yea that could make sense

I guess I was referring to frist pitch called strikes

Follow Me on Twitter @FreeZorilla

by FreeZorilla on Jul 21, 2009 12:25 PM EDT up reply actions   0 recs

Here are the standard deviations for 2009

Cl Str-1.8%
Foul-2.2%
In Play-2.3%
InZSwStr-1%
OZSwStr-1.8%

The differences across pitchers aren’t large, but small changes definitely does have a significant impact on K rates.

by matthan on Jul 21, 2009 1:03 PM EDT up reply actions   0 recs

At first glance it seems that turning your foul balls into swinging strikes makes a huge difference

Price, Kaz, and Niemann please take notice

I can't help that I make some things look easier than they really are.

by Sandy Kazmir on Jul 21, 2009 1:10 PM EDT up reply actions   0 recs

The question is: how?

Is it location? Pitch selection? Tougher batter quality?

by R.J. Anderson on Jul 21, 2009 1:12 PM EDT up reply actions   0 recs

Pitch selection would seem the biggest changer to me.

You can move location in and out of zone, but if you go out you run the risk of a guy taking and if you don’t command properly you run the risk of it getting hit hard. Foul balls happen when a guy is expecting a certain pitch, but the pitcher puts it in a good spot. For example, a guy is sitting fastball and gets it, but it’s low and away so he can only foul it off, or a good diving curve that he only gets a piece. I base this thought on the fact that with most foul balls the batter has the timing down. If a pitcher were to mix speeds better I think he could decrease foul balls and convert those into swinging strikes. On a related note, Price’s change was gorgeous last night, he made batters look foolish when he finally had the guts to go to it.

I can't help that I make some things look easier than they really are.

by Sandy Kazmir on Jul 21, 2009 1:24 PM EDT up reply actions   0 recs

Good points.

Changing speeds, and trying to get guys to chase. To do that you have to be ahead in the count. You also have to have a swing-and-miss pitch.

by rglass44 on Jul 21, 2009 1:26 PM EDT up reply actions   0 recs

Great work Matt

I can't help that I make some things look easier than they really are.

by Sandy Kazmir on Jul 21, 2009 1:04 PM EDT reply actions   0 recs

If interested here are the qualified Rays pitchers from 2003-2008

Last First Year K eK% Difference
SonnanstineAndy 2008 15.14% 14.88% -0.26%
Jackson Edwin 2008 13.64% 14.96% 1.32%
Shields James 2008 18.24% 14.41% -3.83%
Garza Matt 2008 16.58% 13.62% -2.96%
Kazmir Scott 2008 20.06% 22.24% 2.18%
Shields James 2007 21.05% 15.82% -5.23%
Kazmir Scott 2007 26.94% 19.59% -7.36%
Fossum Casey 2005 17.66% 23.90% 6.24%
HendricksonMark 2005 11.18% 10.25% -0.93%
Kazmir Scott 2005 21.27% 28.01% 6.74%
HendricksonMark 2004 10.83% 15.05% 4.22%
Zambrano Victor 2003 15.79% 16.89% 1.10%

by matthan on Jul 21, 2009 1:23 PM EDT reply actions   0 recs

A look at Rays pitchers with more than 1 qualified year

Garza 08-09
Kazmir 05-09 (no 06)
Shields 06-09
Sonnanstine 08-09

The average K% for those pitchers is 18.90%, The average ek% is 17.5%

According to this our pitchers have out-performed their underlyings

The Stdev for the K% is 3.7% whereas for eK% it is 3.4%

by matthan on Jul 21, 2009 1:33 PM EDT reply actions   0 recs

Sonnanstine

K% eK%
2009 13.81% 15.45%
2008 15.14% 14.88%

by matthan on Jul 21, 2009 1:34 PM EDT up reply actions   0 recs

Shields

K% eK%
2009 16.79% 15.27%
2008 18.24% 14.41%
2007 21.05% 15.82%

by matthan on Jul 21, 2009 1:34 PM EDT up reply actions   0 recs

This is pretty surprising to me

Based on the model Shields is expected to strike out around 15-15.5% of batters. He has been pretty consistent on that front. However in reality he has been far better than that, but has been declining every single year. This could be a sign of serious regression. Perhaps Shields is more of the 16-17% K guy than the 18-21% he has shown in the past?

by matthan on Jul 21, 2009 1:41 PM EDT up reply actions   0 recs

Kazmir

K% eK%
2009 16.89% 17.88%
2008 20.06% 22.24%
2007 26.94% 19.59%
2005 21.27% 24.21%

by matthan on Jul 21, 2009 1:35 PM EDT up reply actions   0 recs

Actually the 2005 eK rate is quite a bit higher

28%, not sure what happened

Either way both the actual K% and the eK% has been all over the map for Kaz

by matthan on Jul 21, 2009 1:46 PM EDT up reply actions   0 recs

Garza

K% eK%
2009 21.17% 18.89%
2008 16.58% 13.62%

by matthan on Jul 21, 2009 1:35 PM EDT up reply actions   0 recs

You've done a lot of good work

And these Rays specific comments are interesting. Maybe a separate post of Rays analysis when you get the chance? You could leave the science out at this point and just link to this with Rays analysis.

Follow Me on Twitter @FreeZorilla

by FreeZorilla on Jul 21, 2009 1:51 PM EDT up reply actions   0 recs

Yeah that sounds like a good idea

I may tweak the formula a bit here and there (by using a slightly different sample), but nothing substantial. So I think the next step would be to take a look at the Rays pitchers.

by matthan on Jul 21, 2009 1:58 PM EDT up reply actions   0 recs

One issue.

So you wre using stricty percentages and not total strikes, swinging strikes, etc? Might it be a bit more illuminating if you used the actual amounts so a pitcher with 32 IP doesn’t weigh the same as one with 235?

by rglass44 on Jul 21, 2009 2:05 PM EDT reply actions   0 recs

I don't think this is helpful.

We’re talking about a rate metric here. Yes, the guys with 235 innings have more stable rates, but people should know this.

by R.J. Anderson on Jul 21, 2009 2:12 PM EDT up reply actions   0 recs

In building the model I only used qualified pitchers from 2003-2008

For 2009 I just used the model based upon the qualified pitchers. So the disparity of IP doesn’t really factor in. I’m sure some guys at low IP will have a higher error than guys with tons of IP and stable rates. Although just from looking at how it applies to 2009 it doesn’t appear the increase in error due to low amount of innings is that great.

by matthan on Jul 21, 2009 2:16 PM EDT up reply actions   0 recs

Do you not think it would get a better picture though?

It wouldn’t be hard to include, I wouldn’t think.

Not doubting the validity, but just thinking of ways to make it better.

by rglass44 on Jul 21, 2009 2:37 PM EDT up reply actions   0 recs

Sandy Kazmir mentioned something about Howell

If you look here JP has an expected K rate over 5% higher than anyone else on the Rays, Yanks or Red Sox. He is really really good and is doing exactly as expected

by matthan on Jul 21, 2009 10:49 PM EDT reply actions   0 recs

Yeah almost 30% is good

I can't help that I make some things look easier than they really are.

by Sandy Kazmir on Jul 21, 2009 10:55 PM EDT up reply actions   0 recs

Biggest expected decliners

Aardsma: 29.4% to 22.8%…drop of about 6.6%
Rafael Soriano: 34.7% to 28.8% drop of about 5.9%
Mariano Rivera: 30% to 24.75% about a 5.3% drop
Greinke: 25.5% to 20.3% about a 5.1% drop

by matthan on Jul 22, 2009 4:36 PM EDT up reply actions   0 recs

Biggest expected gainers

Mark Difelice: 22.8 to 30.7 about a 7.9% increase
Cla Meredith: 12.2 to 17.8 about a 5.6% increase
Ramon Ramirez: 17.5 to 22.3
Tommy Hanson-14.2 to 18.4

by matthan on Jul 22, 2009 4:38 PM EDT up reply actions   0 recs

Top overall eK%

1. Broxton 37.5%
2. Wuertz 35.1%
3. Mark Difelice 30.7%
4. Joe Nathan 30.16%
5. JP Howell 29.85%
..
..
..
9. Javy Vazquez 26.5%
10. Verlander 26.35%

by matthan on Jul 22, 2009 4:41 PM EDT reply actions   0 recs

Comments For This Post Are Closed


User Tools

Founded in 2005. DRaysBay is home to "progressive statistical analysis and reasoned argument."
Start posting about the Rays »

Join SB Nation and dive into communities focused on all your favorite teams.

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Desert Dogs to play in the AFL championship @ 2:30 EST.
Desmond Jennings Makes the Tops AA-All-Stars
ABC Coalition to Vote on Draft Report Today.
Fan page for Dick Bosman, Rays minor league pitching coordinator
Upton's Struggles vs Lefties
Evan Longoria wins the Silver Slugger Award
09 Minor Leaguers File for Free Agency
Longoria on the MLB 2k10 cover?
Thank you Tim Marchman.
Longo's Slugcon by Location

+ New FanShot All FanShots >


VPs of Baseball Operations

Nando_small R.J. Anderson

Raysring1_small Tommy Rancel

Zorilla_small FreeZorilla

Price_small Erik Hahmann

Ticket Account Executive

Rays_small Steve Slowinski