LGAIMEMLJul 18, 2021

Top-label calibration and multiclass-to-binary reductions

arXiv:2107.08353v452 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses calibration issues in multiclass classification for machine learning practitioners, offering an incremental improvement over existing calibration techniques.

The paper tackles the problem of interpreting confidence calibration for multiclass classifiers by proposing top-label calibration as a rectification, and introduces a multiclass-to-binary reduction framework that achieves lower calibration errors than existing methods like temperature scaling on datasets such as CIFAR-10 and CIFAR-100.

A multiclass classifier is said to be top-label calibrated if the reported probability for the predicted class -- the top-label -- is calibrated, conditioned on the top-label. This conditioning on the top-label is absent in the closely related and popular notion of confidence calibration, which we argue makes confidence calibration difficult to interpret for decision-making. We propose top-label calibration as a rectification of confidence calibration. Further, we outline a multiclass-to-binary (M2B) reduction framework that unifies confidence, top-label, and class-wise calibration, among others. As its name suggests, M2B works by reducing multiclass calibration to numerous binary calibration problems, each of which can be solved using simple binary calibration routines. We instantiate the M2B framework with the well-studied histogram binning (HB) binary calibrator, and prove that the overall procedure is multiclass calibrated without making any assumptions on the underlying data distribution. In an empirical evaluation with four deep net architectures on CIFAR-10 and CIFAR-100, we find that the M2B + HB procedure achieves lower top-label and class-wise calibration error than other approaches such as temperature scaling. Code for this work is available at \url{https://github.com/aigen/df-posthoc-calibration}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes