Calibration by Distribution Matching: Trainable Kernel Calibration Metrics
This work addresses the need for better calibration in machine learning models to ensure reliable uncertainty estimates, offering a novel approach that integrates calibration into training rather than relying on incremental post-hoc adjustments.
The paper tackles the problem of probabilistic forecast calibration, where existing methods often degrade forecast sharpness, by introducing trainable kernel-based calibration metrics that unify calibration for classification and regression. The result is improved calibration, sharpness, and decision-making across tasks, outperforming post-hoc recalibration methods.
Calibration ensures that probabilistic forecasts meaningfully capture uncertainty by requiring that predicted probabilities align with empirical frequencies. However, many existing calibration methods are specialized for post-hoc recalibration, which can worsen the sharpness of forecasts. Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. Furthermore, we provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions. Our empirical evaluation demonstrates that employing these metrics as regularizers enhances calibration, sharpness, and decision-making across a range of regression and classification tasks, outperforming methods relying solely on post-hoc recalibration.