LGCVMar 9, 2023

Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift

arXiv:2303.05331v13 citationsh-index: 13
AI Analysis

This addresses calibration degradation in out-of-distribution scenarios for machine learning models, representing an incremental improvement over existing methods.

The paper tackles model calibration under distribution shift by proposing an adaptive calibrator ensemble (ACE) that balances calibration functions for in-distribution and out-of-distribution data, improving performance on OOD benchmarks without compromising in-distribution accuracy.

Model calibration usually requires optimizing some parameters (e.g., temperature) w.r.t an objective function (e.g., negative log-likelihood). In this paper, we report a plain, important but often neglected fact that the objective function is influenced by calibration set difficulty, i.e., the ratio of the number of incorrectly classified samples to that of correctly classified samples. If a test set has a drastically different difficulty level from the calibration set, the optimal calibration parameters of the two datasets would be different. In other words, a calibrator optimal on the calibration set would be suboptimal on the OOD test set and thus has degraded performance. With this knowledge, we propose a simple and effective method named adaptive calibrator ensemble (ACE) to calibrate OOD datasets whose difficulty is usually higher than the calibration set. Specifically, two calibration functions are trained, one for in-distribution data (low difficulty), and the other for severely OOD data (high difficulty). To achieve desirable calibration on a new OOD dataset, ACE uses an adaptive weighting method that strikes a balance between the two extreme functions. When plugged in, ACE generally improves the performance of a few state-of-the-art calibration schemes on a series of OOD benchmarks. Importantly, such improvement does not come at the cost of the in-distribution calibration accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes