PAC-Bayes Analysis for Recalibration in Classification
This work addresses calibration issues in machine learning models, particularly for multiclass applications, though it is incremental as it extends existing binary analysis to multiclass with new theoretical bounds.
The paper tackles the lack of theoretical guarantees for calibration error in multiclass classification and parametric recalibration algorithms by deriving an optimizable upper bound using PAC-Bayes analysis, leading to a recalibration algorithm that improves performance on benchmark datasets.
Nonparametric estimation using uniform-width binning is a standard approach for evaluating the calibration performance of machine learning models. However, existing theoretical analyses of the bias induced by binning are limited to binary classification, creating a significant gap with practical applications such as multiclass classification. Additionally, many parametric recalibration algorithms lack theoretical guarantees for their generalization performance. To address these issues, we conduct a generalization analysis of calibration error using the probably approximately correct Bayes framework. This approach enables us to derive the first optimizable upper bound for generalization error in the calibration context. On the basis of our theory, we propose a generalization-aware recalibration algorithm. Numerical experiments show that our algorithm enhances the performance of Gaussian process-based recalibration across various benchmark datasets and models.