Beyond Accuracy: A Stability-Aware Metric for Multi-Horizon Forecasting
This addresses the need for more reliable probabilistic forecasts in time series applications, though it is incremental as it builds on existing seasonal auto-regressive models.
The paper tackled the problem of temporal consistency in multi-horizon forecasting by introducing the forecast AC score, which balances accuracy and stability, resulting in up to 26% improved accuracy for medium-to-long horizons and 91.1% reduced variance in out-of-sample forecasts.
Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event as the forecast origin changes. We introduce the forecast accuracy and coherence score (forecast AC score for short) for measuring the quality of probabilistic multi-horizon forecasts in a way that accounts for both multi-horizon accuracy and stability. Our score additionally allows user-specified weights to balance accuracy and consistency requirements. As an example application, we implement the score as a differentiable objective function for training seasonal auto-regressive integrated models and evaluate it on the M4 Hourly benchmark dataset. Results demonstrate substantial improvements over traditional maximum likelihood estimation. Regarding stability, the AC-optimized model generated out-of-sample forecasts with 91.1\% reduced vertical variance relative to the MLE-fitted model. In terms of accuracy, the AC-optimized model achieved considerable improvements for medium-to-long-horizon forecasts. While one-step-ahead forecasts exhibited a 7.5\% increase in MAPE, all subsequent horizons experienced an improved accuracy as measured by MAPE of up to 26\%. These results indicate that our metric successfully trains models to produce more stable and accurate multi-step forecasts in exchange for some degradation in one-step-ahead performance.