Lessons in Sabermetrics: Evaluating Pitchers Beyond Wins, Losses, and ERA

Pitchers have historically been defined by the number of Wins they are able to accumulate over a season. Awards have been won, careers enshrined in part to this number. The problem with this? Wins shouldn’t be trusted.

ERA has held a similar value in the eyes of baseball fans for generations. The issue? ERA doesn’t paint the full picture, and incredible talents have been lost to the depths of time because of a number that overlooked the quality of their pitching.

This is common with traditional baseball statistics, and has led to the adoption of metrics that hope to tell a more complete story.

What are the “traditional” pitching statistics?

I’m sure if you’ve ever tuned into a baseball game, you’ve been reminded how many Wins (W) and Losses (L) a starting pitcher has accumulated over the season, as well as what their ERA (Earned Run Average) stands at. Historically, they’ve been the most heavily considered statistics when it comes to award voting and Hall of Fame voting. Pitchers have been rewarded with large contracts because of these numbers. Let’s start with Wins and Losses, two possible “decisions” for a starting pitcher. 

The MLB defines a Win as when “[the starting pitcher] is the pitcher of record when his team takes the lead for good.” It’s worth noting a starter must also pitch at least 5 innings to qualify for a W. On the other hand, a Loss, as defined by the MLB, is when “a run that is charged to [the pitcher] proves to be the go-ahead run in the game, giving the opposing team a lead it never gives up.” A pitcher does not need to pitch a minimum amount of innings to earn an L. Together, these statistics create the W-L record. Simple, right? The current leader in wins in 2025 is Freddy Peralta of the Brewers, who has 15 (and 5 Losses).

ERA gives an idea of how many runs a pitcher gives up per 9 innings. Basically, it takes the total number of earned runs a pitcher has given up and divides by the total number of innings a pitcher has pitched, and multiplies that number by 9. Pitchers who allow less runs are going to have lower ERAs. The current ERA leader in 2025 is Pittsburgh’s Paul Skenes, who sits at a 2.07 ERA. The MLB average for ERA in 2025 is 4.14, according to Baseball-Reference.

Should we trust these numbers?

No.

I say that more so for wins and losses than for ERA, and I’ll tell you why. 

Let’s start with a hypothetical: What if your starting pitcher pitched 7 wonderful innings of 1-run baseball. That one run was a solo home run, but it was the only damage allowed by your hurler. Unfortunately, despite terrific pitching from their starter, his supporting offense failed to score a run all game and the game would finish 1-0. Our starter that threw 7 wonderful innings of 1-run baseball? He takes a loss. He didn’t lose his team the game, though, did he? Right.

So there, we see an issue with the win-loss statistic: it’s been a pitcher-defining statistic, despite it relying so heavily upon an entire lineup other than the pitcher himself! Let’s take a look at Jacob deGrom’s 2018 season, when he took home his first NL Cy Young Award. 

deGrom boasted a line that teams can only dream of from their ace: 217 innings, a 1.70 ERA. One of the greatest seasons from a pitcher that baseball has seen, and deGrom’s W-L record from that year? 10-9. The 2018 New York Mets (deGrom’s team) finished 77-85 and scored the 8th fewest total runs in baseball. Historically, deGrom wouldn’t have been looked at twice before being erased of consideration in favor of 20+ win pitchers. However in 2018, he won.

ERA is a trustworthy statistic–it tells us how many runs a pitcher allows. It’s the easiest way to determine if the pitcher has succeeded or failed at doing his job, which is to not allow runs. However, to solely look at ERA would be a mistake (there are other statistics I’ll get into that provide context to our simple ERA later). ERA doesn’t take team defense into account. If Team A is supremely talented defensively compared to Team B, the pitcher on Team A is not going to allow as many runs as a pitcher from Team B assuming they are exactly equally talented pitchers. 

There are more factors to consider shortcomings of ERA, but we’re going to focus on team defense and luck today. Fear not, there are ways to account for all these two and add context to a pitcher’s surface level appearance, and that’s what we’ll get into now.

Find out if a pitcher’s ERA is affected by their team defense and luck with FIP and BABIP

FIP stands for Fielding-Independent Pitching and takes into account the pitcher cannot control the quality of his own defense. So, it completely strips that part out of the equation and focuses on what it believes are the main events in the pitcher’s control: strikeouts, walks (including HBP) and home runs. What’s nice is that FIP is scaled to ERA, making it easy to read and compare the two. To get them on the same scale, the equation of FIP is added to a “FIP constant,” which is generally around 3.10, according to FanGraphs

FIP formula, taken from FanGraphs

The equation may look scary, but it’s really just telling us that FIP takes home runs, walks (including hit by pitches) and strikeouts as its many inputs (along with the weights of each that you don’t need to worry about) over a pitcher’s total number of innings pitched.

FIP can be seen as more of a predictive stat, while ERA is results-based (earned runs allowed). I say this because if a pitcher is being dinged ERA-wise by his team defense, we can get a solid idea of how he’s controlling what he can control: strikeouts, walks and home runs. With most cases in baseball, luck evens out eventually, and we can expect the pitcher to regress toward how he’s expected to pitch.

BABIP, or Batting Average on Balls in Play, is a data point that adds further context to FIP and ERA. I know, batting average is misleading, but BABIP may actually tell us a great deal about a pitcher’s defense and plain luck.

The MLB website defines BABIP as “a player’s batting average exclusively on balls hit into the field of play, removing outcomes not affected by the defense (namely home runs and strikeouts).” A league-average BABIP is around .300, meaning if a pitcher has a BABIP of .250, opposing batters are recording hits in the field of play less often than usual, and if a pitcher has a .350 BABIP, then opposing batters are recording hits in the field of play more often than usual. That being due to the pitcher’s style of pitching or hitter’s tendency of hitting is possible, but extreme cases of BABIP in either direction generally give us a hint that a pitcher (or hitter) could be getting varying levels of luck.

When there exists a gap in a pitcher’s ERA and FIP, the first statistic to consider thereafter would be BABIP, so that you can determine if the pitcher is getting (un)lucky. I’ll show you a couple examples of FIP and BABIP at work.

First, let’s revisit 2018 Jacob deGrom. To go along with deGrom’s 10-9 record and 1.70 ERA, he accumulated a FIP of 1.99, which tells us he was effective at striking hitters out, he limited walks and kept the ball in the ballpark. Further, his BABIP was .281, which tells me that on top of controlling what he could control, he wasn’t benefitting from any outlandish luck with batted balls. deGrom is an example of pure domination and that is further proven with the sub-2.00 ERA and FIP and close-to-average BABIP. 

Now let’s take a look at a pitcher in 2025 whose ERA is perhaps being helped by his defense behind him or is just getting plain lucky. Freddy Peralta leads the league with a 15-5 record this season and he sports a 2.68 ERA to go along with that. If we go a step further, his FIP sits at 3.68, a full run higher than his ERA. Okay, well we know that he’s been able to limit runs effectively considering his ability to control what he can control (missing bats, avoiding walks and limiting home runs). His BABIP is .245, meaning that batted balls (excluding home runs) are falling in for hits just 24.5% of the time, compared to league average ~30% (.300). Is Peralta getting lucky? Maybe a little. Is Peralta benefitting from a good defense? Definitely. The 2025 Brewers are statistically one of the top defensive teams, ranking top 5 in multiple metrics. A talented pitcher like Peralta looks even better on the surface undoubtedly because of his defense behind him (and perhaps some good fortune).

Finally, a pitcher in 2025 who, on the flip side of Peralta, is being hurt by their defense and may be on the wrong side of luck. Jesús Luzardo of the Phillies has made 26 starts, winning 12 games and losing 6. His ERA stands at 4.10. His FIP? 3.04. What can we take away from these two numbers alone? He’s been better at striking hitters out and limiting walks and home runs than his ERA would lead us to believe. Now, if we consider his BABIP, which sits at .338, what do we question?  His ERA! Luzardo’s got the second-highest BABIP on qualified pitchers this year (behind Logan Webb), which lets us know that a 4.10 ERA may not be an honest representation of the pitcher. Perhaps, he’ll improve toward the tail end of this year and those numbers will begin to creep closer to each other, but it also may take time (years) to even out. It’s worth noting Philly’s not great defensively, so he should probably hope for some greater luck his way.

To summarize…

We learned that while many fans rely on wins, losses, and ERA, those numbers alone rarely describe a pitcher accurately. Using FIP to determine how well a pitcher is controlling the things he can control (strikeouts, walks, home runs) gives us a better idea of how the pitcher should be performing in the future. One step further, BABIP adds context to that FIP number and could further explain the good (or bad) luck a pitcher is receiving. These are just a couple of numbers on a growing list of data points to consider with pitchers, and I’ll be sure to dive even deeper in another post.

Leave a comment