MECYLGMay 5, 2023

Statistical Inference for Fairness Auditing

arXiv:2305.03712v218 citations
Originality Incremental advance
AI Analysis

This addresses the need for model-agnostic fairness auditing in high-stakes applications like recidivism prediction, offering a method to certify or flag subpopulations with performance issues, though it is incremental as it builds on existing statistical techniques.

The paper tackles the problem of evaluating black-box models for fairness by framing fairness auditing as multiple hypothesis testing, using the bootstrap to bound performance disparities over groups with statistical guarantees, and finds that the audits provide interpretable and trustworthy guarantees on benchmark datasets.

Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "fairness auditing," in terms of multiple hypothesis testing. We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups with statistical guarantees. Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately. Crucially, our audit is model-agnostic and applicable to nearly any performance metric or group fairness criterion. Our methods also accommodate extremely rich -- even infinite -- collections of subpopulations. Further, we generalize beyond subpopulations by showing how to assess performance over certain distribution shifts. We test the proposed methods on benchmark datasets in predictive inference and algorithmic fairness and find that our audits can provide interpretable and trustworthy guarantees.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes