ML LG APSep 16, 2020

Kaggle forecasting competitions: An overlooked learning opportunity

Casper Solheim Bojer, Jens Peder Meldgaard

arXiv:2009.07701v1253 citations

Originality Synthesis-oriented

AI Analysis

It highlights overlooked learning opportunities in Kaggle competitions for the forecasting community, but is incremental as it reviews existing data.

The paper reviewed six Kaggle forecasting competitions, finding that these datasets have higher intermittence and entropy than M-competitions, and that global ensemble models outperform local single models, with gradient boosted decision trees and neural networks showing strong performance.

Competitions play an invaluable role in the field of forecasting, as exemplified through the recent M4 competition. The competition received attention from both academics and practitioners and sparked discussions around the representativeness of the data for business forecasting. Several competitions featuring real-life business forecasting tasks on the Kaggle platform has, however, been largely ignored by the academic community. We believe the learnings from these competitions have much to offer to the forecasting community and provide a review of the results from six Kaggle competitions. We find that most of the Kaggle datasets are characterized by higher intermittence and entropy than the M-competitions and that global ensemble models tend to outperform local single models. Furthermore, we find the strong performance of gradient boosted decision trees, increasing success of neural networks for forecasting, and a variety of techniques for adapting machine learning models to the forecasting task.

View on arXiv PDF

Similar