Basenerd Research

Predicting Every At-Bat

How Our XGBoost Matchup Model Uses 86 Features, Pitcher Arsenals, and Real-Time Bayesian Updating to Predict PA Outcomes

Every plate appearance in baseball is a collision of tendencies. A batter who can’t lay off breaking balls. A pitcher whose slider generates 40% whiffs. A park that inflates home runs. A bases-loaded, two-out situation that changes everything. And a batter who’s been scorching the ball for two weeks straight.

We built a machine learning model that synthesizes all of these factors into a single prediction: what is the probability of every possible outcome of this plate appearance? The model runs live on our gamecast during every at-bat and powers our pregame predictions page, where you can see projected outcomes for every batter in both lineups before the first pitch is thrown.

What the Model Predicts

For each batter-pitcher matchup, the model outputs probabilities across nine outcome classes:

Outcome	Description
K	Strikeout
OUT	Ball in play, recorded out (including DPs, sac flies)
BB	Walk
HBP	Hit by pitch
IBB	Intentional walk
1B	Single
2B	Double
3B	Triple
HR	Home run

From these raw probabilities, we derive familiar summary stats: xAVG (expected batting average), xSLG (expected slugging), xOBP (expected on-base percentage), along with K% and BB%.

These are not season-long projections. They are matchup-specific probabilities – how this particular batter is expected to perform against this particular pitcher, in this park, in this inning, with these runners on base, and with both players’ recent performance factored in.

The Model Architecture

The model is an XGBoost gradient-boosted tree classifier trained on 730,089 plate appearances from the 2021-2024 MLB seasons and tested on 182,840 PAs from 2025. It uses the multi:softprob objective to output calibrated probabilities across all nine classes simultaneously.

Key training parameters:

436 boosting rounds (early-stopped from 500 max)
Max depth: 5 – enough depth to capture pitch-type interactions
Learning rate: 0.05 with 80% subsampling
Min child weight: 100 – each leaf must represent at least 100 plate appearances
L1/L2 regularization (alpha=0.1, lambda=1.0)

The model achieves a test log loss of 1.4419 on the held-out 2025 season.

The 86 Features

The model ingests 84 numeric features and 2 categorical features, organized into eight groups. This is a significant expansion from the original 45-feature model, driven by three new feature families: pitch-type-specific batter performance, pitcher arsenal breakdowns, and rolling 14-day recent form.

Batter Profile (17 features)

Full-season aggregate stats capturing the batter’s overall offensive identity:

Feature	Description	Why It Matters
`bat_k_pct`	Strikeout rate	Primary driver of K probability
`bat_bb_pct`	Walk rate	Primary driver of BB probability
`bat_whiff_rate`	Swing-and-miss rate	Bat-to-ball skill
`bat_chase_rate`	Chase rate (swings outside zone)	Discipline
`bat_zone_swing_rate`	Swing rate on in-zone pitches	Aggressiveness
`bat_zone_contact_rate`	Contact rate on in-zone swings	Contact quality
`bat_avg_ev`	Average exit velocity	Raw power
`bat_avg_la`	Average launch angle	Fly ball/ground ball tendency
`bat_barrel_rate`	Barrel rate	Optimal contact frequency
`bat_hard_hit_rate`	Hard-hit rate (95+ mph)	Solid contact
`bat_sweet_spot_rate`	Sweet spot rate (8-32 degree LA)	Productive contact
`bat_gb_rate`	Ground ball rate	Batted ball profile
`bat_fb_rate`	Fly ball rate	Batted ball profile
`bat_hr_per_fb`	HR per fly ball	Power efficiency
`bat_iso`	Isolated power	Extra-base hit frequency
`bat_babip`	BABIP	Contact quality + speed
`bat_xwoba`	Expected wOBA	Overall expected production

Batter Platoon Split (7 features)

Same-hand or opposite-hand splits capturing how the batter performs against the pitcher’s throwing arm:

Feature	Description
`bat_plat_k_pct`	K% vs this handedness
`bat_plat_bb_pct`	BB% vs this handedness
`bat_plat_whiff_rate`	Whiff rate vs this handedness
`bat_plat_chase_rate`	Chase rate vs this handedness
`bat_plat_avg_ev`	Exit velocity vs this handedness
`bat_plat_barrel_rate`	Barrel rate vs this handedness
`bat_plat_xwoba`	xwOBA vs this handedness

The platoon K% is the single most important feature in the entire model (12.3% of total feature importance). How a batter handles same-side or opposite-side pitching is the strongest signal for predicting PA outcomes.

Batter vs Pitch-Type Category (15 features) – NEW

How the batter performs against each category of pitch: fastballs (4-seam, sinker, cutter), breaking balls (slider, curveball, sweeper, slurve), and offspeed (changeup, splitter).

For each category, we track five rate stats:

Stat	Per Category
`bvpt_whiff_rate_{cat}`	Whiff rate against this pitch category
`bvpt_chase_rate_{cat}`	Chase rate against this pitch category
`bvpt_zone_contact_rate_{cat}`	Zone contact rate against this pitch category
`bvpt_hard_hit_rate_{cat}`	Hard-hit rate against this pitch category
`bvpt_xwoba_{cat}`	xwOBA against this pitch category

This is crucial because not all batters struggle with the same pitches. A batter who mashes fastballs but whiffs at 40% on breaking balls is a very different matchup against a slider-heavy pitcher than against a fastball-dominant one.

Pitch-Weighted Composite (5 features) – NEW

The model’s most sophisticated feature group. For each batter stat, we compute a weighted average based on the opposing pitcher’s actual pitch mix:

bvpt_w_whiff_rate = (batter_whiff_vs_FB × pitcher_FB_usage) +
                    (batter_whiff_vs_BRK × pitcher_BRK_usage) +
                    (batter_whiff_vs_OFF × pitcher_OFF_usage)

If a batter has a .380 xwOBA against fastballs but .200 against breaking balls, and the opposing pitcher throws 60% breaking balls, this composite captures the true matchup quality in a single number.

Pitcher Arsenal Profile (22 features) – EXPANDED

We break the pitcher’s profile into three tiers:

Aggregate stats (9 features): Overall BNStuff+, BNCtrl+, velocity, whiff rate, chase rate, zone rate, xwOBA, pitch count, workload.

Category usage (3 features): What percentage of pitches are fastballs, breaking balls, and offspeed. A pitcher who throws 70% breaking balls creates a very different matchup than one who’s 70% fastballs.

Top-3 pitch stats (12 features): For the pitcher’s three most-used pitches, we include individual usage, velocity, whiff rate, and BNStuff+. This lets the model learn that a pitcher whose best pitch is a 96 mph 4-seamer with 130 BNStuff+ is a different animal than one whose best pitch is a sweeper.

Feature	Description
`p_pitch1_usage`	Usage rate of primary pitch
`p_pitch1_velo`	Velocity of primary pitch
`p_pitch1_whiff`	Whiff rate of primary pitch
`p_pitch1_stuff`	BNStuff+ of primary pitch
`p_pitch2_*`	Same stats for secondary pitch
`p_pitch3_*`	Same stats for tertiary pitch

Recent Form – Rolling 14-Day (11 features) – NEW

Season-long stats are a starting point, but they miss hot and cold streaks. A batter who’s hit .400 with a .450 xwOBA over the last two weeks is a different threat than his .260 season line suggests.

Batter recent form (6 features):

Feature	Description
`bat_r14_k_pct`	K rate over last 14 days
`bat_r14_bb_pct`	Walk rate over last 14 days
`bat_r14_xwoba`	xwOBA over last 14 days
`bat_r14_barrel_rate`	Barrel rate over last 14 days
`bat_r14_whiff_rate`	Whiff rate over last 14 days
`bat_r14_chase_rate`	Chase rate over last 14 days

Pitcher recent form (5 features):

Feature	Description
`p_r14_k_pct`	K rate over last 14 days
`p_r14_bb_pct`	Walk rate over last 14 days
`p_r14_xwoba`	xwOBA allowed over last 14 days
`p_r14_whiff_rate`	Whiff rate over last 14 days
`p_r14_chase_rate`	Chase rate over last 14 days

These are computed as rolling windows from our Statcast database. For training data, we use a strict look-back approach – each PA only sees form data from before that game date, preventing look-ahead bias. If a player has fewer than 20 pitches in their 14-day window (injury return, early season), we fall back to league averages.

Park Factors (2 features)

Feature	Description
`park_run_factor`	Overall run factor (>1 = hitter-friendly)
`park_hr_factor`	HR-specific factor (Coors > Petco)

Game Context (6 features)

Feature	Description
`inning`	Current inning (1-9+)
`outs_when_up`	Outs in the inning (0, 1, 2)
`n_thruorder_pitcher`	Times through the order for the pitcher
`runner_on_1b`	Runner on first (0/1)
`runner_on_2b`	Runner on second (0/1)
`runner_on_3b`	Runner on third (0/1)

Categorical Features (2)

Feature	Values
`stand`	L (left) or R (right) – batter’s hitting side
`p_throws`	L (left) or R (right) – pitcher’s throwing arm

Feature Importance: What Drives the Predictions?

The top 15 features ranked by importance in the model:

Rank	Feature	Importance	Category
1	`bat_plat_k_pct`	12.3%	Batter Platoon
2	`bat_k_pct`	6.7%	Batter Overall
3	`bat_plat_bb_pct`	5.9%	Batter Platoon
4	`runner_on_2b`	5.5%	Context
5	`p_whiff_rate`	4.2%	Pitcher
6	`runner_on_3b`	4.0%	Context
7	`runner_on_1b`	3.2%	Context
8	`bat_hr_per_fb`	2.5%	Batter Overall
9	`inning`	2.2%	Context
10	`bat_babip`	2.2%	Batter Overall
11	`bat_iso`	2.1%	Batter Overall
12	`p_xwoba`	1.7%	Pitcher
13	`p_throws`	1.7%	Categorical
14	`bat_bb_pct`	1.7%	Batter Overall
15	`bat_plat_xwoba`	1.6%	Batter Platoon

Several patterns emerge:

Platoon splits dominate. Three of the top five features are platoon-specific or handedness-related. The model has learned that how a batter performs against a specific handedness is more predictive than their overall numbers.

Context matters significantly. Runners on base collectively account for ~13% of total importance. The model recognizes that pitcher behavior changes with runners in scoring position.

Pitcher whiff rate is the top pitching feature at 4.2%. The rate stats (whiff, zone, chase) are more predictive than BNStuff+/BNCtrl+ because they capture the downstream outcomes directly.

The pitch-type features are distributed. Rather than any single bvpt feature dominating, the pitch-weighted composites and per-category stats collectively contribute meaningful signal – they refine predictions when a batter has clear pitch-type weaknesses that align with the pitcher’s arsenal.

Bayesian In-Game Updating

The model doesn’t just make static predictions. During live games, it applies Bayesian adjustments based on what’s actually happening on the mound:

Velocity Adjustment

We track the pitcher’s average fastball velocity tonight compared to their season average. Each 1 mph deviation triggers a proportional adjustment:

Throwing harder than expected: K probability increases, HR probability decreases (faster = harder to square up)
Throwing softer than expected: K probability decreases, HR and BB probability increase (less velocity = more hittable, possibly tiring)

Only activated when the delta exceeds 0.3 mph to avoid noise.

Fatigue Curve

After 75 pitches, the model applies progressive fatigue adjustments:

K probability decreases (up to 8% reduction by 100 pitches)
BB probability increases (up to 6% increase)
HR probability increases (up to 4% increase)

These factors are applied as multipliers on the raw XGBoost probabilities, then renormalized to sum to 1.0. The adjustments are displayed on the gamecast so you can see exactly how the live context is shifting the prediction from the pregame baseline.

Pregame Predictions

Before lineups are even posted, you can visit the pregame predictions page for any game. Once lineups drop (typically 1-3 hours before first pitch), the page shows:

Every batter’s predicted outcomes against the opposing starter
HR%, K%, Hit%, and OBP for each lineup spot
Hot/cold indicator based on 14-day rolling xwOBA (green arrow = hot, red = cold)
Pitcher arsenal breakdown showing pitch mix, velocity, and BNStuff+
Team totals – expected strikeouts, home runs, hits, and walks for the full lineup

Click any batter’s row to expand the full probability bar chart showing their complete outcome distribution.

This is designed for fans who want to understand matchups before the game and bettors looking for edges on player props. When the model shows a 6.2% HR probability (roughly 1 in 16) and the market is pricing higher or lower, that’s actionable information.

Pitch Selection Model

Alongside the matchup model, we trained a separate XGBoost pitch selection model on 3.5 million individual pitches. This model predicts what pitch type a pitcher is likely to throw given:

The count (balls/strikes)
The game situation (runners, outs, inning)
Batter handedness
Previous pitch type
The pitcher’s full arsenal usage rates

The pitch selection model’s predictions feed into the matchup model by providing context-aware pitch usage weights rather than simple season-average arsenal rates. When it’s 0-2, the model knows a pitcher is more likely to throw his put-away pitch than his get-me-over fastball.

Outcome Rate Calibration

The model’s predicted aggregate rates closely match actual rates in the 2025 test set:

Outcome	Actual Rate	Predicted Rate	Delta
K	22.2%	22.2%	-0.0%
OUT	46.4%	46.8%	+0.4%
BB	8.1%	7.6%	-0.5%
1B	14.3%	14.3%	+0.0%
2B	4.2%	4.4%	+0.2%
3B	0.3%	0.4%	+0.1%
HR	3.1%	3.1%	+0.0%
HBP	1.1%	1.0%	-0.1%
IBB	0.3%	0.3%	+0.0%

The calibration is excellent – predicted rates are within 0.5 percentage points of actual rates across all outcome types. This means when the model says there’s a 5% chance of a HR, roughly 5% of those situations historically produced home runs.

Important Considerations

What the Model Can and Cannot Do

It can predict the probability landscape of a plate appearance based on: who’s batting, who’s pitching, the platoon matchup, the park, the game situation, the pitcher’s full arsenal, how both players have performed recently, and how this batter handles the types of pitches this pitcher throws.

It cannot account for:

Specific pitch sequences within the at-bat. The model works at the PA level, not the pitch level. It doesn’t know the current count.
Defensive positioning and quality. Shifts and defensive metrics aren’t yet in the feature set.
Game-day weather. Wind, temperature, and humidity affect ball flight. Park factors partially capture average conditions.
Injuries or mechanical changes. If a pitcher tweaked his delivery or a batter adjusted his stance, the model relies on historical data that doesn’t reflect the change.

Fallback Behavior

When a batter or pitcher doesn’t have enough data (rookies, early season, September call-ups), the model falls back to league-average profiles for all features including recent form. This produces sensible baseline predictions rather than breaking. As the season progresses, predictions become more player-specific.

What’s Next

Future improvements we’re exploring:

Catcher framing effects – the catcher behind the plate meaningfully shifts K/BB rates, and we already have the data
Umpire strike zone modeling – each umpire has a measurably different zone
Batter hot/cold zone maps – not just which pitch type, but where in the zone
Count-conditional predictions – updating probabilities as the count changes (0-2 vs 3-0)
Defensive quality integration – incorporating OAA and DRS to refine BABIP predictions

The matchup probability panel is live on all gamecast pages during games, and pregame predictions are available for every game with posted lineups. Check it out next time you’re watching – or betting.