ML LGMay 20, 2022

On the Calibration of Probabilistic Classifier Sets

Thomas Mortier, Viktor Bengs, Eyke Hüllermeier, Stijn Luca, Willem Waegeman

arXiv:2205.10082v212.48 citationsh-index: 69

Originality Incremental advance

AI Analysis

This work addresses the need for reliable uncertainty quantification in machine learning, particularly for ensemble methods, but it is incremental as it builds on existing calibration concepts.

The paper tackles the problem of evaluating the calibration of epistemic uncertainty in sets of probabilistic classifiers, such as ensembles, by extending calibration notions and proposing a nonparametric test. The result shows that ensembles of deep neural networks are often not well calibrated, as demonstrated empirically.

Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via the size of the set. In this paper, we extend the notion of calibration, which is commonly used to evaluate the validity of the aleatoric uncertainty representation of a single probabilistic classifier, to assess the validity of an epistemic uncertainty representation obtained by sets of probabilistic classifiers. Broadly speaking, we call a set of probabilistic classifiers calibrated if one can find a calibrated convex combination of these classifiers. To evaluate this notion of calibration, we propose a novel nonparametric calibration test that generalizes an existing test for single probabilistic classifiers to the case of sets of probabilistic classifiers. Making use of this test, we empirically show that ensembles of deep neural networks are often not well calibrated.

View on arXiv PDF

Similar