LG AI CVOct 16, 2024

Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors

Linwei Tao, Haolan Guo, Minjing Dong, Chang Xu

arXiv:2410.12295v17.93 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses calibration issues in critical applications like healthcare and autonomous driving, though it is an incremental improvement over existing methods.

The paper tackles the problem of miscalibration in deep neural networks by introducing a post-hoc calibration method called Consistency Calibration (CC), which adjusts confidence based on model consistency across perturbed inputs, achieving state-of-the-art performance on datasets like CIFAR-10, CIFAR-100, and ImageNet.

Calibration is crucial in deep learning applications, especially in fields like healthcare and autonomous driving, where accurate confidence estimates are vital for decision-making. However, deep neural networks often suffer from miscalibration, with reliability diagrams and Expected Calibration Error (ECE) being the only standard perspective for evaluating calibration performance. In this paper, we introduce the concept of consistency as an alternative perspective on model calibration, inspired by uncertainty estimation literature in large language models (LLMs). We highlight its advantages over the traditional reliability-based view. Building on this concept, we propose a post-hoc calibration method called Consistency Calibration (CC), which adjusts confidence based on the model's consistency across perturbed inputs. CC is particularly effective in locally uncertainty estimation, as it requires no additional data samples or label information, instead generating input perturbations directly from the source data. Moreover, we show that performing perturbations at the logit level significantly improves computational efficiency. We validate the effectiveness of CC through extensive comparisons with various post-hoc and training-time calibration methods, demonstrating state-of-the-art performance on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet, as well as on long-tailed datasets like ImageNet-LT.

View on arXiv PDF

Similar