Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment
This addresses fairness issues in machine learning for social groups, but it is incremental as it builds on existing fairness notions like disparate treatment and impact.
The paper tackles the problem of unfairness in automated decision-making systems by introducing a new notion of unfairness called disparate mistreatment, defined in terms of misclassification rates across social groups, and proposes a method to incorporate fairness constraints into classifiers, showing effectiveness in experiments with synthetic and real-world datasets, often at a small accuracy cost.
Automated data-driven decision making systems are increasingly being used to assist, or even replace humans in many settings. These systems function by learning from historical decisions, often taken by humans. In order to maximize the utility of these systems (or, classifiers), their training involves minimizing the errors (or, misclassifications) over the given historical data. However, it is quite possible that the optimally trained classifier makes decisions for people belonging to different social groups with different misclassification rates (e.g., misclassification rates for females are higher than for males), thereby placing these groups at an unfair disadvantage. To account for and avoid such unfairness, in this paper, we introduce a new notion of unfairness, disparate mistreatment, which is defined in terms of misclassification rates. We then propose intuitive measures of disparate mistreatment for decision boundary-based classifiers, which can be easily incorporated into their formulation as convex-concave constraints. Experiments on synthetic as well as real world datasets show that our methodology is effective at avoiding disparate mistreatment, often at a small cost in terms of accuracy.