MLLGAug 23, 2024

On the good reliability of an interval-based metric to validate prediction uncertainty for machine learning regression tasks

arXiv:2408.13089v2h-index: 3
AI Analysis

This work addresses the need for more reliable uncertainty validation in regression tasks, particularly for domains like molecular properties, but it is incremental as it adapts existing interval-based approaches.

The study tackled the problem of validating prediction uncertainty calibration in machine learning regression by proposing a shift from variance-based metrics to an interval-based metric (PICP), showing that it enables testing 20% more datasets than a variance-based method with more reliable results.

This short study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration. Considering that variance-based calibration metrics (ZMS, NLL, RCE...) are quite sensitive to the presence of heavy tails in the uncertainty and error distributions, a shift is proposed to an interval-based metric, the Prediction Interval Coverage Probability (PICP). It is shown on a large ensemble of molecular properties datasets that (1) sets of z-scores are well represented by Student's-$t(ν)$ distributions, $ν$ being the number of degrees of freedom; (2) accurate estimation of 95 $\%$ prediction intervals can be obtained by the simple $2σ$ rule for $ν>3$; and (3) the resulting PICPs are more quickly and reliably tested than variance-based calibration metrics. Overall, this method enables to test 20 $\%$ more datasets than ZMS testing. Conditional calibration is also assessed using the PICP approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes