Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
This work addresses fairness issues in machine learning for text classification, providing tools to detect subtle biases that could impact societal fairness, though it is incremental in building on existing bias measurement approaches.
The paper tackles the problem of unintended bias in text classification by introducing a suite of threshold-agnostic metrics to measure nuanced performance differences across demographic groups, and it demonstrates their application on a new large test set of online comments to identify subtle biases in existing models.
Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models.