LGAug 25, 2022

Calibrated Selective Classification

arXiv:2208.12084v238 citationsh-index: 109
Originality Incremental advance
AI Analysis

This work addresses the issue of overconfident or underconfident predictions in machine learning models, particularly for applications like medical diagnosis where calibrated uncertainties are critical, though it is incremental as it builds on existing selective classification and calibration methods.

The paper tackles the problem of unreliable uncertainty estimates in selective classification by developing a method to reject examples with uncertain uncertainties, achieving selective calibration where predictions have well-calibrated uncertainty estimates on accepted examples, with empirical validation on image classification and lung cancer risk assessment tasks.

Selective classification allows models to abstain from making predictions (e.g., say "I don't know") when in doubt in order to obtain better effective accuracy. While typical selective models can be effective at producing more accurate predictions on average, they may still allow for wrong predictions that have high confidence, or skip correct predictions that have low confidence. Providing calibrated uncertainty estimates alongside predictions -- probabilities that correspond to true frequencies -- can be as important as having predictions that are simply accurate on average. However, uncertainty estimates can be unreliable for certain inputs. In this paper, we develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. By doing so, we aim to make predictions with {well-calibrated} uncertainty estimates over the distribution of accepted examples, a property we call selective calibration. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. In particular, our work focuses on achieving robust calibration, where the model is intentionally designed to be tested on out-of-domain data. We achieve this through a training strategy inspired by distributionally robust optimization, in which we apply simulated input perturbations to the known, in-domain training data. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes