Bias In, Bias Out? Evaluating the Folk Wisdom
This addresses the problem of algorithmic fairness for policymakers and researchers by showing that bias in training data does not always lead to biased outcomes, though the results are theoretical and simulation-based.
The paper investigates whether algorithmic decision rules trained on data from biased human decision-makers necessarily inherit that bias, finding that under certain conditions, the algorithm can actually favor the group discriminated against, a phenomenon termed 'bias reversal'.
We evaluate the folk wisdom that algorithmic decision rules trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so "biased" training data arise due to discriminatory selection into the training data. In our baseline model, the more biased the decision-maker is against a group, the more the algorithmic decision rule favors that group. We refer to this phenomenon as "bias reversal." We then clarify the conditions that give rise to bias reversal. Whether a prediction algorithm reverses or inherits bias depends critically on how the decision-maker affects the training data as well as the label used in training. We illustrate our main theoretical results in a simulation study applied to the New York City Stop, Question and Frisk dataset.