An Operational Perspective to Fairness Interventions: Where and How to Intervene
This work addresses the practical operationalization of fairness in AI systems for developers and policymakers, though it is incremental as it builds on existing fairness intervention methods.
The paper tackles the challenge of balancing predictive performance, fairness, and operational costs in AI decision systems by proposing a holistic framework for evaluating fairness interventions, focusing on where and how to intervene. It demonstrates through a case study on predictive parity that distributionally robust optimization methods achieve significant Pareto improvements without using group data at inference, with benchmarking across 400 variations showing XGBoost often outperforms neural networks with fairness interventions.
As AI-based decision systems proliferate, their successful operationalization requires balancing multiple desiderata: predictive performance, disparity across groups, safeguarding sensitive group attributes (e.g., race), and engineering cost. We present a holistic framework for evaluating and contextualizing fairness interventions with respect to the above desiderata. The two key points of practical consideration are \emph{where} (pre-, in-, post-processing) and \emph{how} (in what way the sensitive group data is used) the intervention is introduced. We demonstrate our framework with a case study on predictive parity. In it, we first propose a novel method for achieving predictive parity fairness without using group data at inference time via distibutionally robust optimization. Then, we showcase the effectiveness of these methods in a benchmarking study of close to 400 variations across two major model types (XGBoost vs. Neural Net), ten datasets, and over twenty unique methodologies. Methodological insights derived from our empirical study inform the practical design of ML workflow with fairness as a central concern. We find predictive parity is difficult to achieve without using group data, and despite requiring group data during model training (but not inference), distributionally robust methods we develop provide significant Pareto improvement. Moreover, a plain XGBoost model often Pareto-dominates neural networks with fairness interventions, highlighting the importance of model inductive bias.