Image Counterfactual Sensitivity Analysis for Detecting Unintended Bias
This work addresses the critical need for bias detection in facial classifiers used in high-impact applications like authentication and surveillance, though it is presented as a proof-of-concept with incremental advancements.
The authors tackled the problem of detecting unintended bias in facial analysis models by proposing an image counterfactual sensitivity analysis framework, which uses generative adversarial networks to manipulate face characteristics and measure their effects on a smiling attribute classifier, revealing several factors that influence predictions.
Facial analysis models are increasingly used in applications that have serious impacts on people's lives, ranging from authentication to surveillance tracking. It is therefore critical to develop techniques that can reveal unintended biases in facial classifiers to help guide the ethical use of facial analysis technology. This work proposes a framework called \textit{image counterfactual sensitivity analysis}, which we explore as a proof-of-concept in analyzing a smiling attribute classifier trained on faces of celebrities. The framework utilizes counterfactuals to examine how a classifier's prediction changes if a face characteristic slightly changes. We leverage recent advances in generative adversarial networks to build a realistic generative model of face images that affords controlled manipulation of specific image characteristics. We then introduce a set of metrics that measure the effect of manipulating a specific property on the output of the trained classifier. Empirically, we find several different factors of variation that affect the predictions of the smiling classifier. This proof-of-concept demonstrates potential ways generative models can be leveraged for fine-grained analysis of bias and fairness.