LGMLOct 28, 2019

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

arXiv:1910.12656v1527 citations
Originality Highly original
AI Analysis

This addresses the issue of unreliable probability estimates in multiclass classification for users of machine learning models, offering a more general and effective calibration approach compared to existing methods like temperature scaling.

The paper tackles the problem of uncalibrated multiclass probabilities in classifiers, which often lead to over-confidence, by proposing a natively multiclass calibration method based on Dirichlet distributions that generalizes beta calibration from binary classification. The result shows improved probabilistic predictions across multiple measures, such as confidence-ECE, classwise-ECE, log-loss, and Brier score, on various datasets and classifiers.

Class probabilities predicted by most multiclass classifiers are uncalibrated, often tending towards over-confidence. With neural networks, calibration can be improved by temperature scaling, a method to learn a single corrective multiplicative factor for inputs to the last softmax layer. On non-neural models the existing methods apply binary calibration in a pairwise or one-vs-rest fashion. We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification. It is easily implemented with neural nets since it is equivalent to log-transforming the uncalibrated probabilities, followed by one linear layer and softmax. Experiments demonstrate improved probabilistic predictions according to multiple measures (confidence-ECE, classwise-ECE, log-loss, Brier score) across a wide range of datasets and classifiers. Parameters of the learned Dirichlet calibration map provide insights to the biases in the uncalibrated model.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes