This is one area where I part ways with many of my online friends who are statheads, the mantra that small samples offer no valuable information. Many SABR devotees will instantly dismiss Pitcher vs Batter match ups, since you’re dealing with a tiny handful of ABs. They will further criticize managers like Girardi who look at this info as using old, outdated methods. Check out this exchange from a while back by Rob Neyer on ESPN chat:
John (New York, NY): Rob, the sample size of batter/pitcher matchups is of particular interest to me. Obviously a sample size of 5-10 PAs against a single pitcher does not yield any useful data. However, when you consider that in those 5-10 PAs, a single batter is only facing the repertoire of a single pitcher, my question is how many PAs are required before the data becomes significant? 20? 50? More? What do you think?
Rob Neyer: More than 20. I’m not sure if 50’s enough. I’m not sure if any batter has ever faced a pitcher enough times to show us anything truly meaningful. I think what makes more sense is looking at how a hitter has fared against *types* of pitchers.
I’m a fan of Rob’s, regular reader of The Sweet Spot and generally enjoy his work, but this couldn’t possibly be more wrong headed. Why on Earth would you value a hitter’s ABs against generic Lefthanders more than the specific one he’s facing? Just for the larger sample? That’s just silly, every pitcher is different in terms of repertoire, release point, velocity, how his ball moves, etc etc. Some batters see the ball great out of one pitcher’s hand and just can’t pick it up against the next. The more specific the info, the better. But a by-product of specificity is that samples get smaller and smaller, and thus get dismissed by those who see the game only by the numbers. Because of this some SABR devotees wind up completely missing the situational side of the game, which is how most baseball professionals look at it.
I’m not anti-stat and I fully understand the concept of statistical noise, but there is such a thing as qualitative analysis and it is rightfully used in baseball all the time. What the quants don’t get is when a manager looks at this info, he’s not looking at the numbers. He’s looking at the individual plate appearances (BR has them, subs req’d) and outcomes too see if there’s anything useful there. Sometimes there is, sometimes there isn’t. Here’s a link to give an outline of what I’m talking about.
The ‘Play Index’ will say things like “Line drive to Left” “Infield ground ball to SS” or “Strikeout”. So a player might be 2-14 against a certain pitcher, but 9 of the 12 outs were hard line drives. That’s valuable info, that tells you he hits the pitcher well, just in bad luck that’s due to turn around. When Girardi cites stuff like this, you never hear him say “I batted Hitter X vs Pitcher Y because he was 2-14 against him”. Rather, he says things like “I batted him because he’s had good ABs against that pitcher”. The ‘Play Index’ is the kind of stuff he’s referring to in that much-maligned binder of his, and I’ll bet his info even goes beyond what is publicly available at BR with advanced scouting reports and whatnot. But its also a great example of how there is valuable info in small samples, if you look at the game in a more situational way, and not just purely statistically.