BiasDora: Exploring Hidden Biased Associations in Vision-Language Models
This work addresses the problem of incomplete bias identification in VLMs for researchers and practitioners, though it is incremental as it extends existing bias probing methodologies.
The paper tackles the problem of limited bias detection in Vision-Language Models by uncovering hidden implicit associations across 9 bias dimensions, demonstrating variations in negativity, toxicity, and extremity, and identifying subtle and extreme biases not recognized by existing methods.
Existing works examining Vision-Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, implicit associations across 9 bias dimensions. We systematically explore diverse input and output modalities and (2) demonstrate how biased associations vary in their negativity, toxicity, and extremity. Our work (3) identifies subtle and extreme biases that are typically not recognized by existing methodologies. We make the Dataset of retrieved associations, (Dora), publicly available here https://github.com/chahatraj/BiasDora.