Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals
This work addresses the problem of objective player valuation for football scouting workflows, offering a reproducible decision-support tool, but it is incremental as it builds on existing methods like gradient-boosted regression and NLP features.
The paper tackled the problem of identifying undervalued football players by developing a framework that estimates expected market value from structured data and compares it to observed valuations to detect mispricing, with results showing that market dynamics are the primary signal and NLP features provide secondary gains, improving robustness and interpretability as measured by ROC-AUC ablations.
We present a practical, reproducible framework for identifying undervalued football players grounded in objective mispricing. Instead of relying on subjective expert labels, we estimate an expected market value from structured data (historical market dynamics, biographical and contract features, transfer history) and compare it to the observed valuation to define mispricing. We then assess whether news-derived Natural Language Processing (NLP) features (i.e., sentiment statistics and semantic embeddings from football articles) complement market signals for shortlisting undervalued players. Using a chronological (leakage-aware) evaluation, gradient-boosted regression explains a large share of the variance in log-transformed market value. For undervaluation shortlisting, ROC-AUC-based ablations show that market dynamics are the primary signal, while NLP features provide consistent, secondary gains that improve robustness and interpretability. SHAP analyses suggest the dominance of market trends and age, with news-derived volatility cues amplifying signals in high-uncertainty regimes. The proposed pipeline is designed for decision support in scouting workflows, emphasizing ranking/shortlisting over hard classification thresholds, and includes a concise reproducibility and ethics statement.