Combining Machine Learning and Human Experts to Predict Match Outcomes in Football: A Baseline Model
This work provides a new benchmark and improved baseline for researchers and practitioners interested in predicting football match outcomes, offering a notable gain over existing statistical approaches.
This paper introduces a new benchmark dataset and baseline models for predicting football match outcomes. The models, which combine statistical match data and contextual articles from human sports journalists, achieve an accuracy of 63.18%, demonstrating a 6.9% improvement over traditional statistical methods.
In this paper, we present a new application-focused benchmark dataset and results from a set of baseline Natural Language Processing and Machine Learning models for prediction of match outcomes for games of football (soccer). By doing so we give a baseline for the prediction accuracy that can be achieved exploiting both statistical match data and contextual articles from human sports journalists. Our dataset is focuses on a representative time-period over 6 seasons of the English Premier League, and includes newspaper match previews from The Guardian. The models presented in this paper achieve an accuracy of 63.18% showing a 6.9% boost on the traditional statistical methods.