Recently I was thinking about fielding data, something that’s admittedly an unperfected area of sabermetrics. Debate over just how imperfect the data is has been raging for some time now. I think most of us know that UZR and the likes have serious flaws- prone to sample size issues the data can skew analysis in a variety of ways when used improperly. That being said, it’s currently the best we have. I think it can all be used as long as we keep in perspective the myriad problems with the data- I think of it as a legitimate asterisk when looking through fielding data.
So with all that being said I got curious and wanted to look at the Yankees defensive data that’s available. As you’ll see, I ran into some problems though that I think should be addressed. So keeping in mind that this is a pretty imperfect study and just conversation fodder, away we go.
Did you know the Yankees have had the worst fielders in baseball since 1980? For the past 30 years the Yankees have had some pretty poor defenders. This probably isn’t surprising to fans who remember the days of Hideki Matsui, Bernie Williams and Gary Sheffield patrolling the outfield (we’ll see how horrifying this was shortly).
Of course it didn’t always used to be like this. If you look at all defenders from 1920 to 2010, 90 years of baseball, the Yankees rank 3rd out of 30 teams in fielding runs above average. From 1950 to 2010 they were 8th. Then they really fall off a cliff. Looking at just players from 1970 to 2010 the Yankees rank 26th in defensive runs above average. Finally from 1977 to 2010 the Yankees ranked dead last in defensive runs above average. Obviously the fact that the Marlins, Rays, Diamondbacks and Rockies haven’t been around for very long skew those rankings.
I put together a quick leader board from Fangraphs to look at the worst Yankee defenders over their careers in pinstripes since 1920. I added wRC+ just to make everyone feel a bit better about this. Check it out:
What can you really say about Derek Jeter and his defense that hasn’t already been said? E tu, Bernie? Just think back to 2004 folks when Bernie, Matsui and Sheffield were all in the same outfield. That’s mind numbing, isn’t it? Seeing so many familiar names from the past 20 years or so really explains why those defensive rankings plummeted the more the data was thinned out, no? Poor Mickey. What was he doing at 1st base? Those knees would have really benefited from some DH time.
After looking at this for awhile, I realized there was a problem. This list is from Fangraphs which uses UZR from 2002 on. Before that, it uses Sean Smith’s TZ data for the WAR components. So is it really fair to consider this list that mixes the two data sets? Not really. Using TZ alone looking at Jeter’s career for instance, he’s 129 runs below average. Both systems agree that Jeter has been really, really bad. On the other hand, Bernie Williams 2002-2006 TZ has him as a -70 fielder which is still awful. However when you switch the 2002-2006 data with UZR, which is what these leader boards do, it gets even worse- UZR rates him as -110 for 5 SEASONS. So Bernies total career fielding total looks this way:
TZ alone: - 118
TZ plus UZR after 2002: -152.5
That’s 35.5 runs difference.
So after bouncing around with this stuff what’s my conclusion? I think it would be a whole lot more fair if these WAR numbers found on fangraphs would stick to one fielding system or another. The numbers you see on fangraphs historical WAR charts use BOTH fielding systems: TZ for anything before 2002 and UZR for anything after. If you played before 2002 and after, your Fangraphs defensive WAR component uses both systems. That just seems inconsistent, no? Why not use TZ for any player whose career began before 2002?
I love WAR and the data that Fangraphs provides. I think we should always use the best available methods to help evaluate the game. For me, Fangraphs is the place to go for baseball statistics. I hope that someone can shed some more light on this issue though because its one I don’t really understand.