An Empirical Comparison of Algorithms for Aggregating Expert Predictions
This is an incremental study for researchers and practitioners in prediction aggregation, focusing on sports forecasting with limited generalizability.
The paper tackled the problem of aggregating expert predictions for NFL game outcomes, comparing various algorithms and finding that simple averaging is hard to beat in accuracy, but a Bayesian method estimating expert variance showed consistent improvement in quadratic loss.
Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts' predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert's prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms.