LGCVMar 6, 2023

Rethinking Confidence Calibration for Failure Prediction

arXiv:2303.02970v160 citationsh-index: 68Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of unreliable confidence estimation in safety-critical applications for AI practitioners, but it is incremental as it builds on existing calibration and flat minima research.

The paper finds that most confidence calibration methods are ineffective or detrimental for failure prediction, as they worsen the separation between correct and incorrect predictions, and proposes that flat minima improve failure prediction, with experiments showing performance gains when combining flat minima techniques.

Reliable confidence estimation for the predictions is important in many safety-critical applications. However, modern deep neural networks are often overconfident for their incorrect predictions. Recently, many calibration methods have been proposed to alleviate the overconfidence problem. With calibrated confidence, a primary and practical purpose is to detect misclassification errors by filtering out low-confidence predictions (known as failure prediction). In this paper, we find a general, widely-existed but actually-neglected phenomenon that most confidence calibration methods are useless or harmful for failure prediction. We investigate this problem and reveal that popular confidence calibration methods often lead to worse confidence separation between correct and incorrect samples, making it more difficult to decide whether to trust a prediction or not. Finally, inspired by the natural connection between flat minima and confidence separation, we propose a simple hypothesis: flat minima is beneficial for failure prediction. We verify this hypothesis via extensive experiments and further boost the performance by combining two different flat minima techniques. Our code is available at https://github.com/Impression2805/FMFP

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes