ML LGFeb 17, 2025

All Models Are Miscalibrated, But Some Less So: Comparing Calibration with Conditional Mean Operators

arXiv:2502.11465v110.31 citationsh-index: 4AI

Originality Incremental advance

AI Analysis

This work addresses the need for reliable calibration metrics in high-risk settings, offering an incremental improvement over existing methods for model comparison.

The paper tackles the problem of distinguishing which probabilistic predictive models are better calibrated by proposing the conditional kernel calibration error (CKCE), which is based on the Hilbert-Schmidt norm of conditional mean operators. Experiments on synthetic and real data show that CKCE provides a more consistent ranking of models by calibration error and is more robust against distribution shift.

When working in a high-risk setting, having well calibrated probabilistic predictive models is a crucial requirement. However, estimators for calibration error are not always able to correctly distinguish which model is better calibrated. We propose the \emph{conditional kernel calibration error} (CKCE) which is based on the Hilbert-Schmidt norm of the difference between conditional mean operators. By working directly with the definition of strong calibration as the distance between conditional distributions, which we represent by their embeddings in reproducing kernel Hilbert spaces, the CKCE is less sensitive to the marginal distribution of predictive models. This makes it more effective for relative comparisons than previously proposed calibration metrics. Our experiments, using both synthetic and real data, show that CKCE provides a more consistent ranking of models by their calibration error and is more robust against distribution shift.

View on arXiv PDF

Similar