LG AIAug 6, 2023

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

Shuang Ao, Stefan Rueger, Advaith Siddharthan

arXiv:2308.03172v114.922 citationsh-index: 26Has Code

Originality Highly original

AI Analysis

This work addresses the under-confidence issue in model calibration for safety-critical tasks, offering a more comprehensive solution than prior incremental approaches.

The paper tackles the problem of miscalibration in deep neural networks, which includes both over-confidence and under-confidence, by introducing a novel metric to identify these issues and a calibration technique that addresses both, resulting in substantial performance improvements over existing methods and enhanced failure detection.

Proper confidence calibration of deep neural networks is essential for reliable predictions in safety-critical tasks. Miscalibration can lead to model over-confidence and/or under-confidence; i.e., the model's confidence in its prediction can be greater or less than the model's accuracy. Recent studies have highlighted the over-confidence issue by introducing calibration techniques and demonstrated success on various tasks. However, miscalibration through under-confidence has not yet to receive much attention. In this paper, we address the necessity of paying attention to the under-confidence issue. We first introduce a novel metric, a miscalibration score, to identify the overall and class-wise calibration status, including being over or under-confident. Our proposed metric reveals the pitfalls of existing calibration techniques, where they often overly calibrate the model and worsen under-confident predictions. Then we utilize the class-wise miscalibration score as a proxy to design a calibration technique that can tackle both over and under-confidence. We report extensive experiments that show our proposed methods substantially outperforming existing calibration techniques. We also validate our proposed calibration technique on an automatic failure detection task with a risk-coverage curve, reporting that our methods improve failure detection as well as trustworthiness of the model. The code are available at \url{https://github.com/AoShuang92/miscalibration_TS}.

View on arXiv PDF Code

Similar