ML LGAug 23, 2024

On the good reliability of an interval-based metric to validate prediction uncertainty for machine learning regression tasks

arXiv:2408.13089v2h-index: 3

AI Analysis

This work addresses the need for more reliable uncertainty validation in regression tasks, particularly for domains like molecular properties, but it is incremental as it adapts existing interval-based approaches.

The study tackled the problem of validating prediction uncertainty calibration in machine learning regression by proposing a shift from variance-based metrics to an interval-based metric (PICP), showing that it enables testing 20% more datasets than a variance-based method with more reliable results.

This short study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration. Considering that variance-based calibration metrics (ZMS, NLL, RCE...) are quite sensitive to the presence of heavy tails in the uncertainty and error distributions, a shift is proposed to an interval-based metric, the Prediction Interval Coverage Probability (PICP). It is shown on a large ensemble of molecular properties datasets that (1) sets of z-scores are well represented by Student's-$t(ν)$ distributions, $ν$ being the number of degrees of freedom; (2) accurate estimation of 95 $\%$ prediction intervals can be obtained by the simple $2σ$ rule for $ν>3$; and (3) the resulting PICPs are more quickly and reliably tested than variance-based calibration metrics. Overall, this method enables to test 20 $\%$ more datasets than ZMS testing. Conditional calibration is also assessed using the PICP approach.

View on arXiv PDF

Similar