Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
This work addresses a critical oversight in evaluating gender bias for safe deployment of vision-language models, highlighting that current benchmarks are flawed due to spurious correlations, which is an incremental but important methodological improvement.
The paper tackles the problem of spurious features distorting gender bias evaluation in vision-language models by systematically perturbing non-gender features in benchmarks, finding that minimal changes can shift bias scores by up to 175% in generative models and 43% in CLIP variants, indicating unreliable assessments.
Gender bias in vision-language foundation models (VLMs) raises concerns about their safe deployment and is typically evaluated using benchmarks with gender annotations on real-world images. However, as these benchmarks often contain spurious correlations between gender and non-gender features, such as objects and backgrounds, we identify a critical oversight in gender bias evaluation: Do spurious features distort gender bias evaluation? To address this question, we systematically perturb non-gender features across four widely used benchmarks (COCO-gender, FACET, MIAP, and PHASE) and various VLMs to quantify their impact on bias evaluation. Our findings reveal that even minimal perturbations, such as masking just 10% of objects or weakly blurring backgrounds, can dramatically alter bias scores, shifting metrics by up to 175% in generative VLMs and 43% in CLIP variants. This suggests that current bias evaluations often reflect model responses to spurious features rather than gender bias, undermining their reliability. Since creating spurious feature-free benchmarks is fundamentally challenging, we recommend reporting bias metrics alongside feature-sensitivity measurements to enable a more reliable bias assessment.