LGJun 20, 2025

Critical Appraisal of Fairness Metrics in Clinical Predictive AI

João Matos, Ben Van Calster, Leo Anthony Celi, Paula Dhiman, Judy Wawira Gichoya, Richard D. Riley, Chris Russell, Sara Khalid, Gary S. Collins

arXiv:2506.17035v13 citationsh-index: 48

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of ensuring fairness in AI for clinical practice and patient outcomes, but it is incremental as it reviews existing metrics rather than proposing new solutions.

The paper tackled the problem of unclear fairness definitions in clinical predictive AI by conducting a scoping review that identified and critically appraised 62 fairness metrics from 41 studies, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures.

Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of "fairness" remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a "fairness metric" as a measure quantifying whether a model discriminates (societally) against individuals or groups defined by sensitive attributes. We searched five databases (2014-2024), screening 820 records, to include 41 studies, and extracted 62 fairness metrics. Metrics were classified by performance-dependency, model output level, and base performance metric, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures. Eighteen metrics were explicitly developed for healthcare, including only one clinical utility metric. Our findings highlight conceptual challenges in defining and quantifying fairness and identify gaps in uncertainty quantification, intersectionality, and real-world applicability. Future work should prioritise clinically meaningful metrics.

View on arXiv PDF

Similar