ML AI CV LGDec 27, 2022

Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks

Erdong Guo, David Draper, Maria De Iorio

arXiv:2212.13621v23.81 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses the issue of model calibration for applications like optimal decision-making, though it is incremental as it builds on existing calibration methods with a novel architectural tweak.

The paper tackles the problem of poor calibration in deep neural networks, where models over- or underestimate predictive confidence, by proposing the Annealing Double-Head architecture that achieves state-of-the-art calibration performance without post-processing while maintaining comparable predictive accuracy across various tasks and datasets.

Model calibration, which is concerned with how frequently the model predicts correctly, not only plays a vital part in statistical model design, but also has substantial practical applications, such as optimal decision-making in the real world. However, it has been discovered that modern deep neural networks are generally poorly calibrated due to the overestimation (or underestimation) of predictive confidence, which is closely related to overfitting. In this paper, we propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. To be precise, we construct an additional calibration head-a shallow neural network that typically has one latent layer-on top of the last latent layer in the normal model to map the logits to the aligned confidence. Furthermore, a simple Annealing technique that dynamically scales the logits by calibration head in training procedure is developed to improve its performance. Under both the in-distribution and distributional shift circumstances, we exhaustively evaluate our Annealing Double-Head architecture on multiple pairs of contemporary DNN architectures and vision and speech datasets. We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing while simultaneously providing comparable predictive accuracy in comparison to other recently proposed calibration methods on a range of learning tasks.

View on arXiv PDF

Similar