LG CYMay 13, 2021

An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy Settings

Hemank Lamba, Kit T. Rodolfa, Rayid Ghani

arXiv:2105.06442v17.518 citations

Originality Synthesis-oriented

AI Analysis

This work addresses fairness in high-stakes policy applications like education and healthcare, providing empirical insights for researchers and practitioners, though it is incremental in evaluating existing methods.

The study compared bias reduction methods across four real-world policy problems and found that post-processing with group-specific thresholds consistently removed disparities, while other methods showed variable and inconsistent fairness improvements.

Applications of machine learning (ML) to high-stakes policy settings -- such as education, criminal justice, healthcare, and social service delivery -- have grown rapidly in recent years, sparking important conversations about how to ensure fair outcomes from these systems. The machine learning research community has responded to this challenge with a wide array of proposed fairness-enhancing strategies for ML models, but despite the large number of methods that have been developed, little empirical work exists evaluating these methods in real-world settings. Here, we seek to fill this research gap by investigating the performance of several methods that operate at different points in the ML pipeline across four real-world public policy and social good problems. Across these problems, we find a wide degree of variability and inconsistency in the ability of many of these methods to improve model fairness, but post-processing by choosing group-specific score thresholds consistently removes disparities, with important implications for both the ML research community and practitioners deploying machine learning to inform consequential policy decisions.

View on arXiv PDF

Similar