Multi-Differential Fairness Auditor for Black Box Classifiers
This work addresses fairness auditing for sensitive decision-making systems, providing a method to detect specific discriminatory patterns, though it is incremental in applying fairness metrics to black box settings.
The paper tackles the problem of identifying discrimination victims in black box classifiers by measuring violations of multi-differential fairness, and it demonstrates that in a recidivism risk assessment classifier, African-American individuals with little criminal history are three times more likely to be labeled high risk compared to similar non-African-American individuals.
Machine learning algorithms are increasingly involved in sensitive decision-making process with adversarial implications on individuals. This paper presents mdfa, an approach that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier and demonstrate that individuals identified as African-American with little criminal history are three-times more likely to be considered at high risk of violent recidivism than similar individuals but not African-American.