CVLGMay 1, 2019

Unsupervised Temperature Scaling: An Unsupervised Post-Processing Calibration Method of Deep Networks

arXiv:1905.00174v34 citations
Originality Incremental advance
AI Analysis

This work addresses the need for reliable confidence estimation in high-stakes domains like healthcare, offering an incremental improvement over existing supervised calibration methods by removing the dependency on labeled data.

The paper tackles the problem of deep neural networks producing poorly calibrated confidence scores, which is critical for applications like medical diagnosis, by proposing an unsupervised temperature scaling method that calibrates models without labeled data, achieving competitive calibration performance across various datasets and models.

The great performances of deep learning are undeniable, with impressive results over a wide range of tasks. However, the output confidence of these models is usually not well-calibrated, which can be an issue for applications where confidence on the decisions is central to providing trust and reliability (e.g., autonomous driving or medical diagnosis). For models using softmax at the last layer, Temperature Scaling (TS) is a state-of-the-art calibration method, with low time and memory complexity as well as demonstrated effectiveness. TS relies on a T parameter to rescale and calibrate values of the softmax layer, whose parameter value is computed from a labelled dataset. We are proposing an Unsupervised Temperature Scaling (UTS) approach, which does not depend on labelled samples to calibrate the model, which allows, for example, the use of a part of a test samples to calibrate the pre-trained model before going into inference mode. We provide theoretical justifications for UTS and assess its effectiveness on a wide range of deep models and datasets. We also demonstrate calibration results of UTS on skin lesion detection, a problem where a well-calibrated output can play an important role for accurate decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes