2020 Fantasy Baseball Draft Prep: Searching for breakouts, busts with ACES metric

Last year, on these very pages, we debuted ACES — a measure that distills the quality of a pitcher’s raw stuff into a single metric. As we’re trying to find sleepers to target and busts to avoid on Draft Day, this is another path to take. 

ACES, which stands for “Arsenal Combinations Estimate Scores,” was the next iteration of previous per-pitch analysis birthed by some of the brightest pitching minds in our game — Eno Sarris, Harry Pavlidis, Daniel Schwartz, Alex Chamberlain, Rylan Edwards, Dan Aucoin and I’m sure many before whom served as inspiration.

The primary difference with — and inspiration behind — ACES was the inclusion of Command+, a metric debuted by STATS in 2017 as a way to measure the intent, or command, of a pitcher. While previous arsenal score work included some combination of velocity, movement, whiffs, grounders and/or popups, the introduction of Command+ enabled ACES to develop a fuller picture of a pitcher’s stuff.

Initial tests indicated that ACES was competitively predictive of future ERA while being stickier season-to-season than any other public metric available — despite the small sample, those were promising results. And there were some successes! Charlie Morton, Luis Castillo and Frankie Montas, to name a few, all finished in the top 15% by ACES following the 2018 season and went on to healthily outearn their average draft position (ADP) in Fantasy drafts last season.


We’ll get to some of the updates I made to the ACES metric and how they can help refine what we’re looking for, but I know you want to see the results first — here are the top and bottom 15% starting pitchers by ACES from 2019:

(The full 2017-2019 leaderboard is available  on SportsLine, which includes both ACES and ACES+ by pitch type for both SP and RP)

There will be more analysis to come in future pieces, but here are some initial “quick hit” takeaways:

  • Sonny Gray as the new ACES posterchild!? Although scroll down one line and there he is — Gerrit Cole in the 100th percentile for both ACES and ACES+. Where was this in Pittsburgh? No, I’m not salty at all that he spurned his hometown “Los Angeles” Angels of Anaheim.
  • To me, the holy grail is finding pitchers with both the elite stuff and results, who also possess multiple pitches — here are the 18 names in the top 15% of ACES and top 10% of ACES+ from 2019 (min. 40 IP and three pitches): Cole, Walker Buehler, Jacob deGrom, Zack Wheeler, Lucas Sims (!), Chad Green (reliever), Justin Verlander, Mike Clevinger, Chris Sale, Lance Lynn, Rich Hill, Max Scherzer, Noah Syndergaard, Stephen Strasburg, Brandon Woodruff, Jose Urena (!?) and Blake Snell. Tyler Glasnow and Freddy Peralta (mostly as a reliever) would have made the cut if they had three pitches in our sample. Now that’s a fun list! And, for the most part, includes who you’d expect to see — the cream of the crop.
  • Kyle Hendricks is about as close to a unicorn as you’ll see in today’s game — scroll down to the bottom and you’ll see him in the bottom 10% in ACES (stuff) yet top 10% in ACES+ (outcomes plus stuff)! Nearly every other pitcher in the bottom 15% of ACES is also struggling to generate profitable results (look at all that red).
  • A note on SP/RP classifications and sample size: A few surprising names pop up that probably don’t belong in this starting pitcher conversation — Jonathan Loaisiga (hello again), Chad Green, Freddy Peralta and so on. Blame it on the Rays and their opener strategy; it’s messing with our pitching roles classifications (I just used whomever FanGraphs identified as a “starter”). Also, when looking at a pitcher’s rank, be sure to note the sample to place that score into context (see innings pitched and percentage of arsenal included columns). Still, good to take note of some of these names — small sample be damned.


Not to be overlooked, there were misses last season too (looking at you, Nick Pivetta). In particular, one of the challenges of devising ACES was determining just how much to weight each component of velocity, movement and command. Given Command+ debuted in 2017, there were only two seasons worth of data to test. Now armed with another season’s worth of Command+ and Statcast data, ACES mission this offseason was twofold:

  1. Identify additional predictive measures of a pitcher’s stuff, beyond velocity, movement and command
  2. Determine the proper weighting of those measures for each pitch type (e.g., fourseam, sinker, cutter, etc.)

Upon testing each Statcast measure and optimizing weightings of each pitch type for predictiveness, here are the new average weightings (weightings differ by pitch type):

  • Release Spin — 23%
  • Effective Velocity — 23%
  • V Mov — 17%
  • Command — 15%
  • H Mov — 10%
  • Release Extension — 9%
  • Fastball Velocity Gap — 25% (changeup only)

On average, the weightings of velocity, movement and command aren’t far off from where they were previously. The predictiveness of spin rate and release extension, however, came as more of a surprise.

In the case of spin rate, velocity-adjusted spin generates more whiffs (good) and a higher spin rate is correlated with higher velocities (also good). And since we’re using spin in season-n to predict a pitcher’s performance in season-n+1, perhaps a pitcher is harnessing their superior spin ability in the offseason to generate more velocity, movement or something else that’s driving better performance the next season. Somewhat surprisingly, velocity-adjusted spin — such as Bauer Units — didn’t test as more predictive of future performance than raw spin rate.

Additionally, according to Statcast, a longer release extension essentially shortens the distance between a pitcher and home plate, helping their pitches “play up.” While the effects of release extension may require further testing, it held predictive ability for each pitch type.


So what does it all mean, Basil? As with any metric, ACES ought to be able to do two things:

  • Help us better predict future performance
  • Perform reliably season-to-season (often referred to as “stickiness”)

Let’s test those two elements, beginning with its predictiveness — here’s ACES ability to predict future ERA among starting pitchers who pitched at least 40 innings and reached minimum pitch thresholds per pitch type (see methodology section below) in consecutive seasons from 2017-2019 (n = 68 season pairs):

On the surface, that might not look great — an r-squared (r2) of just 0.12. However, two things to note:

First, pitchers that grade well by ACES in one season generally tend to have lower ERA’s the following season — that’s good and what we’d expect!

Second, ERA is notoriously fickle and extremely difficult to predict, so in context, that’s actually quite respectable! For instance, for the same sample of players, ACES is more than 1.5x predictive of future ERA than FIP or ERA itself. While it falls a bit short of xFIP (0.14), SIERA (0.15) and K-BB% (0.16), the next chart shows where ACES really excels:

Now we’re talking — with a season-to-season r2 in excess of 0.70, ACES is the stickiest public pitching metric I’ve tested. In other words, if you score well by ACES in one season, you likely “own” those skills and will perform well by the metric in the next season. It’s nearly 10x stickier than ERA (0.08) and 1.5-2x stickier than SIERA (0.36), xFIP (0.40) and K-BB% (0.47). 

So far, we’ve only discussed “raw stuff” — qualities a pitcher possesses rather than outcomes they generate. This is partly by design and remains the heart of ACES, in large part enabling its elite season-to-season stickiness. However, what if, in addition to a pitcher’s raw stuff, we also incorporated some key outcomes? For example, arsenal scores of yesteryear incorporated outcomes like whiffs, popups and grounders. While we may lose some stickiness, perhaps we can improve predictiveness by incorporating a pitcher’s ability to generate things like whiffs, strikeouts and poor contact?

Enter: ACES+ (“ACES Plus”). The “plus” being for the core of ACES, raw stuff, plus outcomes that help us better predict future performance.

After testing the long list of outcome metrics available from Statcast for each pitch type, the following weightings were, on average, found to be both relatively sticky and predictive of future performance (weightings differ by pitch type):



Raw Stuff








Release Spin


Avg EV




Poorly Topped


V Mov


Poorly Under




Called Strike %


H Mov




FB Velo Gap


Barrel %




Total (average)


Total (average)


While it varies by pitch, outcomes were, on average, weighted 62% compared to 38% for raw stuff. If the goal is to predict future performance, it makes sense that key elements of previous performance would be weighted more heavily. Unsurprisingly, things related to generating strikeouts — strikeout rate (K%) and swinging strike rate (SwStr%) — were more predictive of future performance than contact quality metrics. However, it was a bit surprising to find strikeout rate — even at the per-pitch type level — testing as more predictive and, in many cases, stickier than swinging strike rate.

What about the testing results for ACES+ — how does it fare?

Look at that! By incorporating outcomes, we’ve improved our r2 to future ERA by over 30% — essentially in line with or higher than the best public pitching metrics we have in K-BB%, xFIP and SIERA.

Let’s check the stickiness:

While there’s still a clear season-to-season relationship with ACES+ (r2 = 0.32), we definitely lost some stickiness by incorporating outcomes. Still, it’s a much more predictive and sticky measure than, say, ERA.

Last note on ACES and ACES+ testing:

Somewhat oddly, ACES and ACES+ perform much better when unweighted. If you’ll recall in previous ACES and arsenal score iterations, a pitcher’s score for each pitch is weighted by their pitch mix — a sensible approach introduced by Alex Chamberlain (see methodology section below for more details on how ACES is calculated). ACES and ACES+ have adopted the same approach in their methodology — after all, if a pitcher throws a certain pitch more, it should be weighted more (and vice versa). As such, all tests shown above are the weighted versions.

However, it’s worth noting that the unweighted version — that is, zero pitch mix weights applied — tested as significantly more predictive than the weighted version and other pitching metrics (e.g., K-BB%, xFIP, SIERA, etc.):

r2 to future ERA













(Stickiness remained roughly similar, in the case of ACES, or improved, in the case of ACES+)

While a small sample size caveat applies, in some respects, this makes sense — when it’s purely cumulative and not weighted, pitchers who have several good pitches benefit. Additionally, while pitch mix is relatively sticky season-to-season, weighting ACES by pitch mix introduces additional variability.

I still prefer to use the weighted version, but now I want to know both figures — as such, I’ve included both weighted and unweighted results for both ACES and ACES+.


While ACES and ACES+ give us a unique view of a pitcher — perhaps as the quantification and manifestation of Rob Friedman’s fire @PitchingNinja Twitter feed — let’s be clear: they’re not meant to supplant other pitching metrics. The best pitching analysis will always combine a variety of sources — a pitcher’s stuff, K-BB%, ERA estimators, recent velocity readings and pitch mix shifts, other “macro” factors, new compelling research, etc.

Having said that, it sure is nice to have pitchers with stuff.

APPENDIX: Acknowledgements, Future Improvements and Methodology Details


My name may be on the article, but ACES was a collaborative endeavor. To thank just a few of many collaborators who made a significant impact:

Many thanks to Eno (@EnoSarris) for providing the inspiration — same for the arsenal score trailblazers! — and connection to STATS and Command+. ACES would be infinitely worse without STATS and Command+ and, frankly, likely wouldn’t exist. I’m much indebted to Jonathan Starr (@JonathanEStarr), who selflessly offered his time and technical wizardry to help me more quick pull all of the necessary Statcast data (also, despite several tests, if the data pulls are off, you know who to message 😊). Dan Aucoin’s (@dan_aucoin13) public work at Driveline and willingness to humor my rudimentary questions via Twitter DM’s were invaluable. Lastly, but certainly not least, thanks to Chris (@CTowersCBS) and the CBS team for giving ACES a home.


Similar to last season’s write up, there were a few things that were ever present in my mind as I was conducting this season’s analysis — just not enough time to incorporate them. Perhaps for next season:

  • Good old data! Another season’s worth of data can only help improve our weightings and reduce reliance on just three years’ worth of data, including only two season pairs (i.e., 2017-2018 and 2018-2019)
  • Potential overfitting of current model — work with those smarter than myself to use random out of sample data to test ACES and determine predictiveness and stickiness
  • Dive deeper into “raw stuff” elements that were somewhat surprisingly predictive — at least to the extent that they were — such as spin rate and release extension (not to mention entirely revisiting analysis of the splitter)


This piece was already long without including the methodology, but for those interested, I’ve included the ACES calculation process below. Hat-tip to Sarris and predecessors, as the methodology is eerily similar to previous arsenal score iterations:

  • Pulled 2017, 2018 and 2019 data for each pitch type from Baseball Savant (H/T to Jonathan)
  • For testing (e.g., weightings) and benchmark (e.g., average velocity, movement, command, etc.) purposes, only considered starting pitchers who had pitched a minimum threshold of pitches (excluded knuckleballs and screwballs):
  1. Minimum 500 pitches: Fourseam
  2. 250: Sinker, Curve, Slider and Change
  3. 150: Cutter
  4. 75: Splitter
  • Utilized STATS Command+ metric to measure command
  • In the final percentile calculations, included only pitchers who pitched at least 5 innings; percentiles are calculated by season and pitcher role (e.g., SP’s scores from 2017 are compared against each other, etc.)
  • Within each pitch type, calculated each pitcher’s z-score for the components included (e.g., velocity, movement, command, etc.) and summed them using the most predictive weights for each pitch type* (determined in testing phase mentioned above)
  • Weighted the z-scores by pitch mix frequency and added the weighted z-scores to create each pitcher’s ACES metric
  • Z-score calculation assumptions (based on a mix of testing and previous research):
  1. Higher velocity, extension, spin and command is better (except for the splitter — ahh, the splitter, what an, um, fun pitch to analyze)
  2. Higher velocity gaps between fastball and changeup is assumed to be better
  3. Vertical “drop” (i.e., lower number) is good, except for four-seamers (assumed “rise” is better, i.e., higher number)
  4. Absolute horizonal movement is good, whether arm-side or glove-side (i.e., higher absolute value)
  5. Splitter: based on testing, less velocity, spin and horizontal movement were assumed to be better (my gosh, more research is needed on the splitter)


Unlike last season — when I said “this is admittedly more art than I’d prefer” — when we only had two seasons worth of data, I used more rigorous quantitative testing to determine how important each of the included components were towards the success of a pitch. Nevertheless, it’s not a trivial question with a simple answer. I based on the weightings on predictiveness to a mix of future expected wOBA (xwoBA) and strikeout rate (K%). This was paired with the findings from previous analysis by Sarris found that whiffs were roughly two times more correlated with ERA estimators than grounders. Additional analysis on the best curveballs found that velocity was the key for whiffs while drop was the key for grounders. Based on those findings, it appears we should weight velocity two times more than movement. FanGraphs community research found similar importance of velocity. I primarily leaned on the findings from my testing, but incorporated these findings, as well — this is where the “art” came in. More data can only help further refine these weightings.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *