ML LGMay 30, 2018

Why Is My Classifier Discriminatory?

Irene Chen, Fredrik D. Johansson, David Sontag

arXiv:1805.12002v235.0446 citations

Originality Incremental advance

AI Analysis

This work addresses fairness in predictive models for sensitive applications like healthcare or criminal justice, offering a data-centric approach that is incremental to existing fairness-accuracy trade-off methods.

The authors tackled the problem of unfairness in classifiers by arguing that discrimination arises from data issues like small sample sizes or missing variables, and should be addressed through improved data collection rather than model constraints. They decomposed discrimination metrics into bias, variance, and noise, and validated in case studies on income, mortality, and review ratings, showing that data collection can reduce discrimination without sacrificing accuracy.

Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy. In sensitive applications such as healthcare or criminal justice, this trade-off is often undesirable as any increase in prediction error could have devastating consequences. In this work, we argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model. We decompose cost-based metrics of discrimination into bias, variance, and noise, and propose actions aimed at estimating and reducing each term. Finally, we perform case-studies on prediction of income, mortality, and review ratings, confirming the value of this analysis. We find that data collection is often a means to reduce discrimination without sacrificing accuracy.

View on arXiv PDF

Similar