MLLGJun 24, 2024

Forecasting with Deep Learning: Beyond Average of Average of Average Performance

arXiv:2406.16590v14 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more nuanced evaluation in forecasting, which is important for researchers and practitioners, but it is incremental as it builds on existing methods by proposing a new evaluation framework.

The paper tackles the problem of evaluating forecasting models by showing that averaging performance metrics dilutes important information about relative model performance under different conditions, such as forecasting horizon and anomalies. It demonstrates that while NHITS generally performs best, its superiority varies, e.g., it only outperforms classical methods for multi-step ahead forecasting and is outperformed by Theta when dealing with anomalies.

Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. We hypothesize that averaging performance over all samples dilutes relevant information about the relative performance of models. Particularly, conditions in which this relative performance is different than the overall accuracy. We address this limitation by proposing a novel framework for evaluating univariate time series forecasting models from multiple perspectives, such as one-step ahead forecasting versus multi-step ahead forecasting. We show the advantages of this framework by comparing a state-of-the-art deep learning approach with classical forecasting techniques. While classical methods (e.g. ARIMA) are long-standing approaches to forecasting, deep neural networks (e.g. NHITS) have recently shown state-of-the-art forecasting performance in benchmark datasets. We conducted extensive experiments that show NHITS generally performs best, but its superiority varies with forecasting conditions. For instance, concerning the forecasting horizon, NHITS only outperforms classical approaches for multi-step ahead forecasting. Another relevant insight is that, when dealing with anomalies, NHITS is outperformed by methods such as Theta. These findings highlight the importance of aspect-based model evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes