Calibration through the Lens of Indistinguishability
This work provides a theoretical framework for evaluating probabilistic predictions in machine learning, which is incremental as it synthesizes recent research without introducing new methods.
This survey addresses the problem of interpreting predicted probabilities by exploring foundational questions on defining and measuring calibration error, and its implications for downstream decision-making. It presents a unifying viewpoint of calibration as indistinguishability between the predictor's hypothesized world and the real world.
Calibration is a classical notion from the forecasting literature which aims to address the question: how should predicted probabilities be interpreted? In a world where we only get to observe (discrete) outcomes, how should we evaluate a predictor that hypothesizes (continuous) probabilities over possible outcomes? The study of calibration has seen a surge of recent interest, given the ubiquity of probabilistic predictions in machine learning. This survey describes recent work on the foundational questions of how to define and measure calibration error, and what these measures mean for downstream decision makers who wish to use the predictions to make decisions. A unifying viewpoint that emerges is that of calibration as a form of indistinguishability, between the world hypothesized by the predictor and the real world (governed by nature or the Bayes optimal predictor). In this view, various calibration measures quantify the extent to which the two worlds can be told apart by certain classes of distinguishers or statistical measures.