SPAIAPNov 28, 2025

What If They Took the Shot? A Hierarchical Bayesian Framework for Counterfactual Expected Goals

arXiv:2511.23072v12 citations
Originality Incremental advance
AI Analysis

This provides an uncertainty-aware tool for player evaluation, recruitment, and tactical planning in soccer, with potential applications in other domains where individual skill and contextual factors shape performance, representing an incremental improvement over existing methods.

This study developed a hierarchical Bayesian framework to quantify player-specific effects in expected goals (xG) estimation, addressing the limitation of standard models that treat all players as identical finishers, achieving strong external validity with correlations up to R2 = 0.833 and enabling counterfactual analyses such as showing Sansone would generate +2.2 xG from Berardi's chances.

This study develops a hierarchical Bayesian framework that integrates expert domain knowledge to quantify player-specific effects in expected goals (xG) estimation, addressing a limitation of standard models that treat all players as identical finishers. Using 9,970 shots from StatsBomb's 2015-16 data and Football Manager 2017 ratings, we combine Bayesian logistic regression with informed priors to stabilise player-level estimates, especially for players with few shots. The hierarchical model reduces posterior uncertainty relative to weak priors and achieves strong external validity: hierarchical and baseline predictions correlate at R2 = 0.75, while an XGBoost benchmark validated against StatsBomb xG reaches R2 = 0.833. The model uncovers interpretable specialisation profiles, including one-on-one finishing (Aguero, Suarez, Belotti, Immobile, Martial), long-range shooting (Pogba), and first-touch execution (Insigne, Salah, Gameiro). It also identifies latent ability in underperforming players such as Immobile and Belotti. The framework supports counterfactual "what-if" analysis by reallocating shots between players under identical contexts. Case studies show that Sansone would generate +2.2 xG from Berardi's chances, driven largely by high-pressure situations, while Vardy-Giroud substitutions reveal strong asymmetry: replacing Vardy with Giroud results in a large decline (about -7 xG), whereas the reverse substitution has only a small effect (about -1 xG). This work provides an uncertainty-aware tool for player evaluation, recruitment, and tactical planning, and offers a general approach for domains where individual skill and contextual factors jointly shape performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes