Understanding Unequal Gender Classification Accuracy from Face Images
This addresses fairness and bias issues in AI systems for marginalized groups, with incremental insights into specific causes of performance disparities.
The paper investigated the unequal gender classification accuracy in commercial face recognition systems, particularly the poor performance on dark-skinned females, and found that skin type and hair length are not the primary drivers; instead, differences in facial features like lip, eye, and cheek structure across ethnicity, along with gender stereotypes from makeup, contribute to the gap.
Recent work shows unequal performance of commercial face classification services in the gender classification task across intersectional groups defined by skin type and gender. Accuracy on dark-skinned females is significantly worse than on any other group. In this paper, we conduct several analyses to try to uncover the reason for this gap. The main finding, perhaps surprisingly, is that skin type is not the driver. This conclusion is reached via stability experiments that vary an image's skin type via color-theoretic methods, namely luminance mode-shift and optimal transport. A second suspect, hair length, is also shown not to be the driver via experiments on face images cropped to exclude the hair. Finally, using contrastive post-hoc explanation techniques for neural networks, we bring forth evidence suggesting that differences in lip, eye and cheek structure across ethnicity lead to the differences. Further, lip and eye makeup are seen as strong predictors for a female face, which is a troubling propagation of a gender stereotype.