LGMLApr 19, 2023

Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts

arXiv:2304.09836v210 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses the problem of unreliable evaluation metrics for multivariate probabilistic forecasts, which is incremental as it builds on existing scoring rules but provides new finite-sample insights.

The paper systematically studies the finite-sample reliability of proper scoring rules for evaluating multivariate probabilistic time series forecasts, identifying 'regions of reliability' where these rules can detect forecasting errors, and reveals critical shortcomings in current evaluation practices.

Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expectation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time-series forecasting evaluation. Through a power analysis, we identify the "region of reliability" of a scoring rule, i.e., the set of practical conditions where it can be relied on to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes