LGCHEM-PHDATA-ANMEMay 17, 2023

Properties of the ENCE and other MAD-based calibration metrics

arXiv:2305.11905v19 citations
Originality Synthesis-oriented
AI Analysis

This addresses a methodological issue in calibration assessment for regression problems, which is incremental but important for researchers and practitioners using these metrics.

The paper identifies that the Expected Normalized Calibration Error (ENCE) and a variance-based metric (ZVE) depend on the number of bins used in estimation, scaling with the square root of bins for calibrated datasets, and proposes a solution to derive bin-independent values and a calibration test.

The Expected Normalized Calibration Error (ENCE) is a popular calibration statistic used in Machine Learning to assess the quality of prediction uncertainties for regression problems. Estimation of the ENCE is based on the binning of calibration data. In this short note, I illustrate an annoying property of the ENCE, i.e. its proportionality to the square root of the number of bins for well calibrated or nearly calibrated datasets. A similar behavior affects the calibration error based on the variance of z-scores (ZVE), and in both cases this property is a consequence of the use of a Mean Absolute Deviation (MAD) statistic to estimate calibration errors. Hence, the question arises of which number of bins to choose for a reliable estimation of calibration error statistics. A solution is proposed to infer ENCE and ZVE values that do not depend on the number of bins for datasets assumed to be calibrated, providing simultaneously a statistical calibration test. It is also shown that the ZVE is less sensitive than the ENCE to outstanding errors or uncertainties.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes