I’ll be honest with you.
Had I known from the beginning that the idea of designing a model to estimate the difficulty of blocking every major league pitch is not a new one—let alone a groundbreaking one—I might have spent significant portions of my free time doing stuff that involves sun and physical activity instead. But, I didn’t. And, anyway, as Jovanotti already sang: “Se tutti i grandi libri qualcuno li ha già scritti, mi chiedo ragazzi voi che cosa fate?”
And, yes, I am aware you have no idea who Jovanotti is. Your loss, really.
Vertical or horizontal plane?
When deciding how to track the location of the pitches to be caught, I was considering two options, neither of which really seemed completely appropriate. With help from Mike Fast I was able to calculate the spot where every pitch would have landed had the ball traveled untouched until it hit the ground.
While this was really helpful for pitches that clearly landed in the dirt in front of the catcher, it gave us only obscure information about the rest of the pitches. Is a pitch that would otherwise land 40 feet behind the catcher easier or harder to block than the one that would land 30 feet behind him? And if two pitches both land 40 feet behind the catcher, do a curveball and a fastball cross the plane where the catcher is placed at the same height?
On the other hand, the PITCHf/x data offered the height and the width of every pitch as it crosses the front edge of the plate. This leads to similar problems. First, the catcher finds himself several feet behind that plane. Second, the balls in the dirt are represented with the negative height number, again leading us to guess where they bounced.
Discussing these issues with Tom Tango, it became clear that if we want to model reality, we have to make the model as real as possible. In this case it meant working in three dimensions and using both the horizontal and a vertical plane, the former for the pitches that usually bounce in front of the catcher and the latter for those that don’t.
I placed the vertical plane three feet behind the back edge of the home plate, because that is where I expect the catcher’s mitt to be most of the times. You might or might not agree with this. I don’t think there is a singular correct distance from the plate that can be always used, as some catchers position themselves deeper than others and they also adapt to the batter positioning himself in the batter’s box. Generally, they try to move as far ahead as possible without interfering with the swing.
The letter C in the above graphic is placed just about three feet behind the back edge of the home plate. And although the angle on the following shot of Yadier Molina doesn’t show great perspective, I think it can be assumed that his glove—not his body—is just behind the end of the batter’s box.
For sure, some of the pitches that would touch the ground a bit more than three feet behind the home plate will not be caught in the air. Similarly, some of the ones that would bounce just a bit in front of that imaginary line will be caught on the fly. As we are not dealing with the binary model here, it doesn’t really matter all that much.
After calculating the appropriate landing/crossing spots and creating some quick buckets, this is what the probabilities of a pitch getting away from the catcher looked like, depending on its location:
In a nutshell, we see that pitches that bounce are harder to block than the ones that don’t and that pitches further away from the center of the plate pose more problems for the catchers than the ones which are over the plate. Incredible, right? Still, this kind of data presentation made it easier for me to visualize the certain areas where catchers have to receive the pitches.
For the purposes of this research, I didn’t really care whether the balls that got away from the catcher were scored as a wild pitch or as a passed ball. So, quite ingeniously and after many hours of creative brainstorming, I came up with a name for both of them together—Passed Pitch (PP). To determine the percentages in the above chart, we have to filter the pitches where a PP can happen in the first place. These are:
- With runners on base, all the called strikes, swinging strikes and balls*
- With no runners on base, all the called and swinging strikes when the count is already two strikes
* In addition, I decided to ignore all the pitchouts and intentional balls, although PP can occasionally happen on such pitches. The price of losing a few passed pitches seemed acceptable when compared to the danger of seriously skewing the data by including them.
I first looked at the PP dependency on pitch length/height. From the model above, the “length” of the pitch is the distance between the spot where the pitch bounced and the catcher’s plane. It is represented with a positive Y number. The “height” is the distance above ground at the catcher’s plane for the pitches that didn’t bounce. It is represented with a negative Y number.
The further in front of the catcher the pitch bounces, the harder it is to block. The pitches between two and three feet above the ground are the easiest to catch and it deteriorates from there upwards. That was rather easy, the pitch width is more complicated:
What you see is the overall dependency and the absolute distances from the center of the plate. Before going into more detail, it can be said that in general, the pitches further away from the center of the plate are harder to block.*
* As I mentioned above, I ignore intentional balls and pitchouts. When I don’t, it looks as if a ball four to five feet off the plate is easier to catch than the one three feet off.
Three more factors come into effect regarding pitch width. The batter handedness is one and it is basically a disturbance factor. The pitches inside are harder to block than the ones outside because the catcher has the batter and his bat to deal with and to obscure his vision of the ball. The other factor is that the width of the pitch doesn’t seem to matter the same on different lengths of the pitch. A pitch that is a foot away from the center of the plate will not increase the chances of a PP by the same rate when it is in the dirt as it will when it is belt high, so I had to use multiple regressions.
And, finally, the center of the plate is not the easiest place to catch the ball, but rather a spot about half a foot to the left of it, as seen from the catcher’s perspective. I assume it has something to do with the fact that all the catchers are right-handed.
Unlike batter handedness, the pitcher handedness didn’t influence the outcome, although it did seem so in the beginning. Right-handed pitchers appeared to be tougher to catch, but it turned out it is due to the two factors we can otherwise control, pitch location and speed. More on speed in a second. On average, right handers threw about a mile and a half per hour faster than their left-handed counterparts and they threw to the places where catchers have tougher time catching the ball:
Speed matters, but mostly only on the pitches in the dirt. There was a great correlation between pitch speed and PP percentage (controlling for the pitch location), but only on the pitches about a foot above the ground or lower ones:
As for the pitch movement, I saw some correlation between vertical movement and PP percentage on the balls in the dirt, but that mostly came by the way of speed correlation (faster pitches generally had more vertical movement). Horizontal movement did not seem to affect the probabilities on the short pitches, but showed some correlation on the higher ones. I decided against incorporating both of them into the model, as the correlations seemed inconclusive.
Instead of classifying the pitches by the type, going with speed and movement gives a fairer comparison, as both Jamie Moyer and Justin Verlander throw “fastballs,” for example. But, at latest when I started with the individual rankings and saw where poor Jarrod Saltalamacchia ended up, it was clear to me that there is one pitch type that needs to be looked at separately:
On average, a knuckleball is seven times more likely to get away as other pitches of the same speed and in the same location are. Not everybody has the same problems with the knuckleball, though. Here is the list of all the catchers with at least 100 knuckleball pitches that needed be caught over the last four years:
These are rather small samples, but that’s all we have (we’ll get to what the numbers mean in a second, for now it’s enough to know that numbers in the last two columns are bad when negative and good when positive). Every catcher performs worse against a knuckleball, but Saltalamacchia seemed to do an even poorer job than the rest of them.
Once the ball is not cleanly fielded by the catcher, other effects come into account, too. How far away from the catcher did the ball end up? How fast are the runners involved? Were they inclined to run based on the score? How much respect do they have for the catcher’s throwing arm? How far do they have to run? The answers to the first four questions are evenly spread between “no idea” and “too much work to check it out,” but the last one is rather easy. We can check the base runner’s state and how it affects the probability of a pitch getting away:
Scoring from third and reaching first happen less often than the model predicts, while advances to second and third are more likely than what is calculated from other factors. It makes sense, too. Second base means the longest throw, and while first and third are equally far away from the catcher, the big difference is in getting the jump. The runner from second already has the lead and has nothing to worry about but running. The batter who just struck out might not even realize he has a chance to run until it’s too late.
And here is the final one, the one that I found to be quite counter-intuitive. If a batter disturbs the catcher just by being there, him swinging will cause even more of a disturbance, right? So, controlling for everything else and looking just at swing-versus-no-swing states we come up with this:
The only explanation I can come up with is hit-and-run. The runner is being sent, the batter protects and swings at the bad pitch, the ball rolls away from the catcher. The runner from first would have made it to second anyway, but due to the fact that he started prior to the pitch, his advance is being credited as a stolen base and not as a variety of a PP. I let this one just be.
Putting everything together
I originally only used the 2010 and 2011 data to model the expectancies, because I wanted to use the 2008 and 2009 as a sort of a control group. That’s true for the most of the regressions I used, although few of them occurred to me after I have imported the 2009 season (base runners, pitch type and swinging). After importing 2008, I ran a number of checks, comparing what my model would expect and what really happened. I looked at close pitches, clearly wild pitches, slow ones, fast ones, the ones by left handers, the ones by right handers, split them by inning, year and month and they all held up rather well. Here is the most random split I thought of:
I thought of using the day of the week, too, but I was afraid I could run into a replacement-catcher-on-a-Sunday-morning bias.
The bottom line is that this model has its inaccuracies. With what I looked at and the way I looked at it, they seemed to be acceptable. In no way or form am I suggesting that it is perfect—and I’m certain that there will be objections and/or desired improvements out there—but in order to carry on from here I will use it as an evaluation tool for the catchers. For better or for worse.
You want some names, right? Here are some names. This is the list of 15 best catchers in blocking pitches over the last four years:
What columns mean:
- cPP: Expected number of passed pitches from the model
- Pitches: Number of qualifying pitches as described above
- PP: Actual number of passed pitches
- PP+/-: The difference, with positive numbers indicating catchers who blocked more than their fair share of pitches
- Rpp+/-: The number of runs above or below average, using 0.28 conversion rate from The Book
- Rpp120: Prorated runs saved using 120 games and the league average 42 PP qualifying pitches per game
And here are the 15 worst ones:
(complete data here)
The swing between the best and the worst catchers seems to be about one win a year. Or, put in absolute terms, over last four years Yadier Molina’s performance blocking pitches was about three-and-a-half wins more than that of Miguel Olivo.
Is it a skill?
I used all the catchers with at least 1,000 chances in each of the last four years and split their even and odd years. This is how these two buckets compare:
(complete data here)
Glove versus arm
I mentioned that perhaps some catchers get good results blocking pitches because the runners are afraid to take their chances against good throwing arms. I checked the correlation between preventing base stealing and preventing advances on passed pitches, but found none:
Playing with the numbers
Recently, Mike presented his great research on catchers’ skills in framing pitches. FanGraphs offers the data on the catchers’ abilities to prevent stolen bases. What if we combined all these numbers for 2011?
(data for catchers with at least 500 defensive innings in 2011 here)
We see the heavy influence of the framing component. Alex Avila was below average both with his glove and his arm, yet he more than made up for it with the framing part. Mike Napoli lost his overall lead, but—for those of you counting at home—he was still five-and-a-half wins better than Jeff Mathis.
And, finally, I looked at the defensive talent spread observed in 2011, by defensive positions. I included all players with at least 500 innings played at that position and for everyone but catcher, I used UZR/150 numbers. For catchers, I once used only the stolen bases component of their defense and once the cumulative number comprised of all three components:
Or, perhaps, previous steps? There are at least two other researches on this topic. Dan Turkenkopf wrote about it more than three years ago and Dave Allen took a similar approach as I did back in 2009. When I started my work, I was aware of the former, but not of the latter. Before you start asking me whether I’ve been living under a sabermetric rock or on a deserted island, let me preemptively admit—I did grow up on a small Mediterranean island that, by most standards, could be considered pretty deserted. So, I have that working for me.
What can be done next?
First, this model can be further improved upon. Just as I was finishing this article, I realized another dependency:
It took me a while to realize why the relative PP percentages went down in the last three rows. When there are runners on first, first and second or on every base and the count is already three balls, a passed pitch that was not swung at will be masked by the runner advancing on base on balls. I’ll implement that into the model, but I do not expect any significant changes out of it.
The other thing we can see from this chart is that pitchers throw tougher pitches to block when they are ahead. So, a possible further step would be to look at the whole issue from the pitchers’ side. Are pitchers more likely to go for the strikeout by bouncing the curveball in the dirt when they have a good blocking catcher? Is it quantifiable?
And, you know, you could always go discover Jovanotti’s music.
Published on THT on October 18th, 2011