LGCYAPMEMLFeb 17, 2023

On (assessing) the fairness of risk score models

arXiv:2302.08851v229 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses fairness for risk score models, which is crucial for enabling meaningful human oversight in applications like recidivism and mental health prediction, though it is incremental in extending fairness concepts from classifications to risk scores.

The paper tackles the problem of fairness in risk score models, which have been less studied than discrete decisions, by proposing a key desideratum of providing similar epistemic value to different groups and introducing a novel calibration error metric that reduces sample size bias, enabling meaningful fairness comparisons across groups of different sizes.

Recent work on algorithmic fairness has largely focused on the fairness of discrete decisions, or classifications. While such decisions are often based on risk score models, the fairness of the risk models themselves has received considerably less attention. Risk models are of interest for a number of reasons, including the fact that they communicate uncertainty about the potential outcomes to users, thus representing a way to enable meaningful human oversight. Here, we address fairness desiderata for risk score models. We identify the provision of similar epistemic value to different groups as a key desideratum for risk score fairness. Further, we address how to assess the fairness of risk score models quantitatively, including a discussion of metric choices and meaningful statistical comparisons between groups. In this context, we also introduce a novel calibration error metric that is less sample size-biased than previously proposed metrics, enabling meaningful comparisons between groups of different sizes. We illustrate our methodology - which is widely applicable in many other settings - in two case studies, one in recidivism risk prediction, and one in risk of major depressive disorder (MDD) prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes