LG MLFeb 8, 2023

On the Richness of Calibration

arXiv:2302.04118v213.012 citationsh-index: 49

Originality Incremental advance

AI Analysis

This work provides a foundational framework for calibration evaluation, which is incremental but clarifies and extends existing methods in algorithmic fairness.

The paper tackles the problem of evaluating probabilistic predictions through calibration, organizing design choices into a framework that enables comparison and formulation of novel calibration scores with desirable properties, and demonstrates that appropriate grouping can yield fairness measures for groups or individuals.

Probabilistic predictions can be evaluated through comparisons with observed label frequencies, that is, through the lens of calibration. Recent scholarship on algorithmic fairness has started to look at a growing variety of calibration-based objectives under the name of multi-calibration but has still remained fairly restricted. In this paper, we explore and analyse forms of evaluation through calibration by making explicit the choices involved in designing calibration scores. We organise these into three grouping choices and a choice concerning the agglomeration of group errors. This provides a framework for comparing previously proposed calibration scores and helps to formulate novel ones with desirable mathematical properties. In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions and formally demonstrate advantages of such approaches. We also characterise the space of suitable agglomeration functions for group errors, generalising previously proposed calibration scores. Complementary to such population-level scores, we explore calibration scores at the individual level and analyse their relationship to choices of grouping. We draw on these insights to introduce and axiomatise fairness deviation measures for population-level scores. We demonstrate that with appropriate choices of grouping, these novel global fairness scores can provide notions of (sub-)group or individual fairness.

View on arXiv PDF

Similar