AISep 22, 2024

Scoring rule nets: beyond mean target prediction in multivariate regression

arXiv:2409.14456v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses a critical issue in multivariate regression for practitioners needing accurate variance estimates, though it is incremental as it builds on existing scoring rules.

The paper tackles the problem of overestimated variance in multivariate probabilistic regression models trained with maximum likelihood estimation by proposing Conditional CRPS, a multivariate strictly proper scoring rule. It shows that this method often outperforms MLE and produces results comparable to state-of-the-art non-parametric models like Distributional Random Forest in experiments on synthetic and real data.

Probabilistic regression models trained with maximum likelihood estimation (MLE), can sometimes overestimate variance to an unacceptable degree. This is mostly problematic in the multivariate domain. While univariate models often optimize the popular Continuous Ranked Probability Score (CRPS), in the multivariate domain, no such alternative to MLE has yet been widely accepted. The Energy Score - the most investigated alternative - notoriously lacks closed-form expressions and sensitivity to the correlation between target variables. In this paper, we propose Conditional CRPS: a multivariate strictly proper scoring rule that extends CRPS. We show that closed-form expressions exist for popular distributions and illustrate their sensitivity to correlation. We then show in a variety of experiments on both synthetic and real data, that Conditional CRPS often outperforms MLE, and produces results comparable to state-of-the-art non-parametric models, such as Distributional Random Forest (DRF).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes