Baseball Notes: Offensive Discrimination

baseball notes

Although they may continue to cite them because of their familiarity as reference points, baseball analysts largely have moved on from the historically conventional hallmarks of pitcher and batter performance– ERA and batting average (“BA”), respectively– in favor of more comprehensive metrics that provide a more accurate picture of player performance by addressing some of those traditional statistics’ blind spots.

Focusing here on hitters, some of BA’s most notable blind spots include walks; the fact that each park has different dimensions; and the significant variance in the values of different types of hits (e.g., a single versus a home run). As they have with WAR, the three main baseball-analytics websites each offer their own improved versions of BA: Baseball Prospectus’ True Average (“TAv”); Baseball-Reference’s adjusted on-base-plus-slugging (“OPS+”); and FanGraphs’ Weighed Runs Created Plus (“wRC+”). Visually, TAv looks like a batting average but is scaled every year such that an average hitter has a TAv of .260, while OPS+ and wRC+ are scaled to an average of 100.

If you’ve read baseball articles here or at those websites, then you’ve seen those metrics cited, sometimes seemingly interchangeably, in the course of an examination of hitting performance. As BP’s Rob Mains notes in the first part of a recent two-part series at that site, there’s good reason to treat these three metrics similarly: they all correlate very strongly with each other. (In other words, most batters who are, for example, average according to TAv (i.e., .260) also are average according to OPS+ and wRC+ (i.e., 100).)

There are differences between the three, however, and those differences arise because each regards the elements of batting performance slightly differently. As Rob explained:

How the three derive the numbers themselves, including their respective park factors, is pretty small ball. Bigger ball, though, it what goes into them.

  • OPS+ incorporates the same basic statistics as OPS: At-bats, hits, total bases, walks, hit by pitches, and sacrifice flies.
  • wRC+ weights singles, doubles, triples, home runs, walks, and HBPs, with the weighting changing from year to year. For example, a home run had a weight of 2.337 in 1968 but only 1.975 in 1996, reflecting the scarcity of runs in the former year. Additionally, wRC+ considers only unintentional walks.
  • TAv also weights outcomes, including strikeouts (slightly worse than other outs) and sacrifices (slightly better than other outs). TAv also includes batters reaching base on error and incorporates situational hitting[, which refers to hitting that occurs only when runners are on base: Sacrifice hits, sacrifice flies, and hitting into double and triple plays].

So while all three measures look at the same thing—hitting—they’re not doing it quite the same. For OPS+, a walk is as good as a hit, from an OBP perspective, and a home run is four times as good as a single, per SLG. FanGraphs’ wRC+ weights them, but it doesn’t weight outs, as TAv does. Only TAv considers situational hitting.

When applied to players who are especially good or bad in those areas where the three metrics diverge, the result is a lack of correlation between the three with respect to that player. (Cf. the divergent views of the three WAR metrics with respect to Robbie Ray.) Mains’ second article examines some of those players of whom TAv, OPS+, and wRC+ take different views (e.g., Barry Bonds, Kris Bryant, Ian Kinsler, and David Ortiz) before explaining a few general conclusions:

[TAv, OPS+, and wRC+ are] very similar. You can use any of them and feel confident that you’re usually capturing the key characteristics of a batter.

If you want to drill down, though, here are the differences I found:

  • The lack of weighting in OPS+ means that it gives slightly less weight to singles and slightly more weight to home runs and walks than TAv and wRC+.

  • TAv’s inclusion of situational hitting means that batters who are extremely good or bad at avoiding double plays are going to get rewarded or penalized. (Situational hitting also includes bunting, but nobody does that anymore anyway.)

  • The black box factor in these calculations is park factors. Each of the three sites calculates them their own way. They can account for some changes, though not in a predictable or transparent way like high walk totals or low GIDP rates can.

I expect I’ll continue to use these three metrics somewhat interchangeably in articles at this site, although my preexisting (mostly uneducated) preferences for TAv and wRC+ likely will continue. Articles like Rob’s serve as both an important reminder that, at the edges, these updated metrics aren’t exactly the same and an entry point into thinking more precisely about what we ourselves value in the process of evaluating hitter performance.

___________________________________________________________________

Previously
Baseball Notes: Current Issues Roundup
Baseball Notes: Baseball’s growth spurt, visualized

Baseball Notes: The WAR on Robbie Ray
Baseball Notes: Save Tonight
Baseball Notes: Current Issues Roundup
Baseball Notes: The In-Game Half Lives of Professional Pitchers
Baseball Notes: Rule Interpretation Unintentionally Shifts Power to Outfielders?
Baseball Notes: Lineup Protection
Baseball Notes: The Crux of the Statistical Biscuit
Baseball Notes: Looking Out for Number One
Baseball Notes: Preview

Advertisements

The Best Baseball Research of the Past Year

Once again, the Society for American Baseball Research has chosen fifteen (non-ALDLAND) finalists for awards in the areas of contemporary and historical baseball analysis and commentary.

My latest post at Banished to the Pen highlights each finalist. The winners will be announced on Sunday.

The full post is available here.

The Best Baseball Research of the Past Year

Once again, the Society for American Baseball Research has chosen fifteen (non-ALDLAND) finalists for awards in the areas of contemporary and historical baseball analysis and commentary, and they are holding a public vote to determine the winners.

My latest post at Banished to the Pen highlights each finalist and includes a link to cast your vote to help determine the winners.

Read the full post, which includes summaries of each of the fifteen nominated pieces; reveals my ballot; and includes some general comments on this year’s selections here.

2016 Oregon is the Oregon Everyone Thought They Were Watching for the Last Decade

There is a myth that exists in college football that some really good teams are great offenses with bad defenses. These teams win games by scores like 62-51 or 45-38, and, so the theory goes, they are just good enough on offense to outscore any opponent.

In reality, all great teams are fairly complete, meaning that they are good in all phases of the game. You can’t really be a great team if you have a bad defense. What apparently fools everybody is the fact that football is a game with no set pacing. A baseball game is nine innings, or twenty-seven outs if you prefer. Golf is eighteen holes. A set in tennis is six games. Games like football and basketball are different. A football game can range, at the extremes, from something like seven possessions (this year, Navy v. Notre Dame) to as many as seventeen or eighteen. The typical range is more like 9-10 for a low-possession game, and perhaps fifteen for a high-possession game. But, as with basketball, certain teams tend to play high-possession games, and certain teams tend to play low-possession games. Teams that play high-possession games generally feature hurry-up offenses, or pass-happy offenses, or defenses that prefer to gamble for stops rather than playing “bend but don’t break”. Teams that play-low possession games will be teams that run the ball a lot, or play conservative defense that seeks to avoid giving up big plays at the expense of allowing lots of first downs. As should be somewhat obvious, teams that play high-possession games tend to score more points, and they allow more points, all else being equal. For some reason, we collectively seem to appreciate this in basketball, and we don’t necessarily consider low scoring teams to be “bad” on offense. We look to efficiency rankings instead.

Football analysis is catching up, but nobody seems to be taking notice. The stats I will be quoting are from Football Outsiders (very good site if you’ve never seen it).  This site ranks offenses and defenses as units, based on some advanced per-possession stats that attempt to adjust for quality of opponent. This is obviously an imperfect process, but in my opinion it provides much better information than simply saying that, because a team averages 35.6 points per game, they are “good” on offense.

Oregon has long had a reputation as a high-flying offense and a poor defense. I think it is time to challenge that assumption. Offensively, they’ve been good, no question. Since 2007, their offensive ranks have been 7th, 13th, 11th, 11th, 5th, 2nd, 6th, 1st, 13th, and 18th. This year, their 18th rank is their worst on offense in a decade. That’s pretty good. But what about defense? Since 2007, they have been 19th, 42nd, 22nd, 5th, 9th, 4th, 29th, 28th, 84th, and 126th. Raise your hand if you are surprised, particularly about the stretch for 2010 to 2012 (5th, 9th, and 4th). The 2010 national title game was billed as two great offenses against two bad defenses (Auburn and Oregon), yet somehow, those two defenses held the great offenses to some of their lowest point totals all season (22 to 19). Turns out, when analyzed properly, both were great defenses as well (that just so happened to be playing with extreme, hurry up offenses, so they played many high scoring games).

I still consider the absence of a playoff in 2012 to be a travesty. 2012 Oregon vs. 2012 Alabama would have been a great game, and we needed to see it. If only somebody could have beaten Notre Dame during the regular season…  Oh well.

In any event, Oregon’s national-title-contender status from 2010 to 2014 was based upon great offense AND great defense. Last year they still managed to be 9-4, a pretty good year, with the 84th defense. But this year, with a truly terrible defense, they are 4-8, despite still having a great offense.

And that is normal. Many teams follow that formula. For example, 2013 Indiana (8th offense, 105th defense, 5-7 record), 2012 Baylor (5th offense, 94th defense 8-5 record), 2010 Michigan (8th offense, 107th defense, 7-6), 2009 Stanford (4th offense, 104th defense, 8-5 record). Another prominent team that had this reputation was West Virginia under Rich Rodriguez. As a 2007 national title contender that lost in an upset to Pitt to drop out of the title game, then routed Oklahoma in the Fiesta Bowl, they were 3rd on offense and 9th on defense. Not quite what most people thought.

The bottom line is that you won’t be a great team without being at least good on defense. There may be an exception or two (I haven’t researched every team from all time), but the general rule is pretty clear: if you are an elite offense and a below-average defense, you will be .500 or maybe a little better. 8-5 or 9-4 is about the best you can possibly do, and most do worse. Anybody winning 11 or 12 games has a good defense. Don’t be confused if a team like that sometimes gives up a lot of points. Maybe they are playing against a great offense, and/or defending more possessions than most other teams. If they are 12-2, its virtually guaranteed they’ve got a strong defense. Don’t believe the myth.

Baseball Notes: The WAR on Robbie Ray

baseball notes

There are a few things we know with reasonable certainty about Robbie Ray. He was born on October 1, 1991 just south of Nashville in Brentwood, Tennessee. In 2010, the Washington Nationals drafted him in the twelfth round of the amateur draft. The Nationals traded him, along with two other players, to the Detroit Tigers in 2013 in exchange for Doug Fister. A year later, the Tigers traded him to the Arizona Diamondbacks as part of a three-team trade that netted the Tigers Shane Green and the New York Yankees Didi Gregorius. So far, Ray has seen major-league action as a starting pitcher with the Tigers and Diamondbacks. He showed promise in his first three appearances (two starts and an inning of relief), for Detroit. He showed less promise in his remaining six appearances– four starts and two relief innings– for that team. Things have ticked back up for Ray since his arrival in the desert, however.

__________________________________

Most baseball fans likely have some familiarity with the player-valuation concept of wins above replacement player, usually labeled WAR. What many fans may not realize, however, is that there actually are three different versions of the WAR statistic. The goal of each version is the same: to determine a comprehensive valuation of an individual baseball player. Each takes slightly different paths to reach that comprehensive valuation, but they typically reach similar conclusions about a given player, such that it’s common to see or hear a player’s WAR cited without specific reference to the particular version utilized.

For example, the three versions– Baseball-Reference’s WAR (“rWAR”), FanGraphs’ WAR (“fWAR”), and Baseball Prospectus’ WARP (“WARP”)– all agree that Mike Trout had a great 2016. He finished the season with 10.6 rWAR, 9.4 fWAR, and 8.7 WARP, good for first, first, and second by each metric, respectively. For another example, they also agree about Trout’s former MVP nemesis, Miguel Cabrera: 4.9 rWAR, 4.9 fWAR, 3.9 WARP. (In my anecdotal experience, WARP tends to run a little lower than rWAR and fWAR for all players.)

__________________________________

While the WAR varietals typically and generally concur, that isn’t always the case. Pitchers can be particularly susceptible to this variance, because the measurement of pitching performance is one of the areas in which the three metrics are most different. Continue reading

Baseball Notes: Save Tonight

baseball notes

It is an accepted reality that, in general, baseball players don’t have much time for their sport’s new and advanced statistics and metrics. In many ways, this resistance makes sense. In the moment, when standing on the mound or in the batter’s box, there’s only so much thought and information a player can hold in his mind while trying to accomplish the task– make or avoid contact between bat and ball, for example– at hand. Players, like experts in other fields, also understandably tend to be skeptical of outsiders’ ability to provide baseball analysis or insight superior to their own. This skepticism is fairly well documented, most obviously when it involves changes that might impair or decrease a player’s value or role in the game, and, more surprisingly, even when new statistical revelations work in a player’s favor. (There certainly are some players, like Jake Lamb and Trevor “Drone Finger” Bauer, who have embraced sabermetric thinking, but it’s reasonable to assume they remain in the minority among their colleagues.)

A primary impetus of baseball’s sabermetric movement has been to encourage the abandonment of certain traditional statistics that, while still largely entrenched in the sport, are understood to be incomplete in important ways or much less meaningful than their use might suggest. Batting average, for example, doesn’t include walks. (Cf. On-base percentage.) RBIs require a player’s teammates to reach base ahead of him. ERA depends, to a significant extent, on a pitcher’s defensive teammates and other factors outside a pitcher’s control. (Cf. defensive-independent pitching statistics like FIP and DRA.) Pitcher wins and saves are artificial, highly circumstantial metrics that, at best, indirectly measure pitching talent.

For years, analysts have pushed baseball to rid itself of these traditional performance measures. There’s a comfort in hanging onto the statistical language with which we grew up as we learned and discussed the game, but that comfort should turn cold upon learning the degree to which these familiar stats obscure what’s really happening on the field.

So long as baseball’s current player-compensation structure remains in place, though, players aren’t likely to stop caring about things like saves; after all, that’s how they’re paid:

screenshot-2016-10-19-at-5-46-24-pm

In the course of discussing whether departures from conventional reliever usage as particularly exhibited in the 2015 and 2016 playoffs are likely to bleed over into upcoming regular season play, FanGraphs’ Craig Edwards explains one reason players are likely to prefer conventional, save-oriented bullpen strategy:

Saves get paid in a big way during arbitration. Only one player without a save, Jared Hughes, received a free-agent-equivalent salary above $6.5 million in arbitration, while all 16 players who’d recorded more than 10 saves received more than Hughes in equivalent salary. Players are more than happy to make more money, so giving more relievers higher salaries and more multi-year deals is openly welcomed. Taking saves away, however, also takes money away from players with less than six years of service time.

Although there are a number of not-uncompelling reasons why players prefer to steer clear of baseball’s newer metrics, Edwards has fingered one of the most forceful. If fans and analysts want to hear players discuss OBP, DRA, and leverage, they ought to channel their persuasive efforts less toward appeals to players’ logical sensibilities (they get it, no doubt) and more toward the education of the MLB salary arbitrators, to whom the players already listen with great attention.

___________________________________________________________________

Previously
Baseball Notes: Current Issues Roundup
Baseball Notes: The In-Game Half Lives of Professional Pitchers
Baseball Notes: Rule Interpretation Unintentionally Shifts Power to Outfielders?
Baseball Notes: Lineup Protection
Baseball Notes: The Crux of the Statistical Biscuit
Baseball Notes: Looking Out for Number One
Baseball Notes: Preview

The Best Baseball Research of the Past Year

Once again, the Society for American Baseball Research has chosen fifteen (non-ALDLAND) finalists for awards in the areas of contemporary and historical baseball analysis and commentary, and they are holding a public vote to determine the winners.

My latest post at Banished to the Pen highlights each finalist and includes a link to cast your vote.

As a preview, here are my summaries of my two favorite articles of the bunch:    Continue reading

Snapshot: How good has the Detroit Tigers starting rotation been to date?

No, Justin Verlander hasn’t appeared in a single game this season. Yes, it’s still early in the year to be issuing deeply meaningful assessments of baseball team performances. No, I still have not pulled together a proper introductory post for this season’s Tigers series. Instead, you’ll have to get by with this extensive team season preview, which remains not wholly inaccurate, a writeup on Detroit’s bounceback from its first loss in Pittsburgh, a quick peek at changes in team base-stealing profiles, a podcast from earlier this week, and the following snapshot of the Tigers’ rotation through twenty-one games.

This morning, Baseball Prospectus released a new pitching metric, Deserved Run Average (“DRA”), which is designed as a replacement for ERA. You can read more about DRA here (and a nauseatingly detailed exposition of it here), but the one-line summary is simple: “By accounting for the context in which the pitcher is throwing, DRA allows us to determine which runs are most fairly blamed on the pitcher.” After all, that’s what we want to know when we look at a pitcher’s ERA. DRA, it would appear, allows us to know that with greater accuracy.

With that new tool in hand, here are 2015’s most valuable pitchers so far, factoring in their newly calculated DRA:

dra-pwarp-4-29-15Plenty of familiar names on that list, especially for Tigers fans, who will find all five of this season’s starters– David Price (#2), Alfredo Simon (#7), Shane Greene (#9), Anibal Sanchez (#15), and even Kyle Lobstein (#22)– among the thirty most valuable pitchers of this young season.

Through thick and thin offensive production thus far, plenty of credit for the team’s 14-7 record is due to the starting rotation, which, you need not be reminded, unloaded Max Scherzer, Doug Fister, and Rick Porcello in the past two offseasons. Surprisingly, so far, so good.

The moral implications of StatCast

moralitycastIf your neighborhood baseball nerd is nerding out a little more than usual today, it’s probably because Pluto’s in retrograde right now or something, and it definitely doesn’t have anything to do with tonight’s television broadcast debut of StatCast, which will go far beyond showing balls and strikes by tracking things like player movements and batted-ball data. Ben Lindbergh has a good preview of the technology and its chief implications for expanded baseball analysis here.   Continue reading

Taking a pass on new hockey statistics

hockey pass

A quick refresher on hockey’s new statistics: puck possession correlates more strongly with winning than do things like goals or shots; measuring possession in a fluid game like hockey is difficult; as a practical solution, Corsi and its less-inclusive sibling, Fenwick, are statistics that track certain, more easily measured events (all shots, including on-goal shots and missed shots, and, in Corsi’s case, blocked shots), thereby serving as proxies for possession and, therefore, indicators of team success. Once you get past the names (as the NHL is in the process of doing), the concept is simple.

One way to improve Corsi might be to make it more comprehensive. If Corsi approximates possession by counting certain indicia of possession, it stands to reason that a similar metric could better approximate possession by counting more indicia of possession. In looking for other things to add, and keeping in mind that the practical computational benefit of Corsi is that it is comprised of easily tallied events, pass attempts– including both completed and unsuccessful passes– would seem to meet both criteria. Pass attempts indicate possession the same way shot attempts, as broadly defined under Corsi, do, and they should be nearly as easy to count.

I can think of two potential reasons why it might not make sense to expand Corsi to include pass attempts: 1) it is significantly more difficult to identify and count pass attempts than the shot attempts already being tracked, and 2) adding pass attempts to a possession proxy metric like Corsi does not significantly increase the value of the metric.

While the first might be true, it also may make it easier to collect more events. For the limited purposes of a relatively simple metric like Corsi, there should be no need to code or label the component events compiled into the single Corsi output. Adding pass attempts would save trackers from having to decide whether to include or exclude an ambiguous shot-attemptish thing. As for the second, I attempted to address this with someone who has written on the general subject, but, likely due to my own ineptitude, the exchange resembled two ships passing in the night, which is a terrible and sufficient way to conclude this post.

________________________________________________________

Related
Bouncing puck: Passing, not shooting, is the key to scoring on the ice and the hardcourt
More on passing data and the shot quality debateHockey Prospectus
There’s no such thing as advanced sports statistics