CYLGAug 29, 2023

Reliability Gaps Between Groups in COMPAS Dataset

arXiv:2308.15243v12 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses fairness concerns in criminal justice risk assessments for marginalized groups, but it is incremental as it builds on existing reliability studies with a simulation approach.

This paper investigates whether different social groups are affected differently by inter-rater reliability issues in risk assessment instruments, using a simulation study on the COMPAS dataset with injected noise. The main finding is that there are systematic differences in output reliability between groups, with the sign of the difference depending on the statistical measure used and whether group prediction prevalences are corrected.

This paper investigates the inter-rater reliability of risk assessment instruments (RAIs). The main question is whether different, socially salient groups are affected differently by a lack of inter-rater reliability of RAIs, that is, whether mistakes with respect to different groups affects them differently. The question is investigated with a simulation study of the COMPAS dataset. A controlled degree of noise is injected into the input data of a predictive model; the noise can be interpreted as a synthetic rater that makes mistakes. The main finding is that there are systematic differences in output reliability between groups in the COMPAS dataset. The sign of the difference depends on the kind of inter-rater statistic that is used (Cohen's Kappa, Byrt's PABAK, ICC), and in particular whether or not a correction of predictions prevalences of the groups is used.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes