LGMLApr 25, 2025

A comprehensive review of classifier probability calibration metrics

arXiv:2504.18278v14 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for reliable calibration assessment in safety-critical and business contexts, but it is incremental as it reviews and organizes existing metrics rather than introducing new ones.

This paper tackles the problem of AI/ML models producing probabilities that do not reflect true accuracy by providing a comprehensive review of 82 probability calibration metrics for classifier and object detection models, organizing them into families and including equations for implementation.

Probabilities or confidence values produced by artificial intelligence (AI) and machine learning (ML) models often do not reflect their true accuracy, with some models being under or over confident in their predictions. For example, if a model is 80% sure of an outcome, is it correct 80% of the time? Probability calibration metrics measure the discrepancy between confidence and accuracy, providing an independent assessment of model calibration performance that complements traditional accuracy metrics. Understanding calibration is important when the outputs of multiple systems are combined, for assurance in safety or business-critical contexts, and for building user trust in models. This paper provides a comprehensive review of probability calibration metrics for classifier and object detection models, organising them according to a number of different categorisations to highlight their relationships. We identify 82 major metrics, which can be grouped into four classifier families (point-based, bin-based, kernel or curve-based, and cumulative) and an object detection family. For each metric, we provide equations where available, facilitating implementation and comparison by future researchers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes