On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning
This work addresses systematic discrimination in AI systems, which is a critical issue for fairness in domains like criminal justice, though it is incremental as it builds on existing methods for bias correction.
The paper tackles the problem of disparate impact in machine learning by proposing an information-theoretic framework to analyze and correct discrimination in binary classification models, demonstrating its effectiveness on the COMPAS dataset for recidivism prediction.
In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. Our aim is to find a correction function that can perturb the input distributions of each group to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expressions to efficiently compute the correction function, and demonstrate the benefits of our framework on a recidivism prediction problem based on the ProPublica COMPAS dataset.