CVMar 7, 2025

Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models

arXiv:2503.05093v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses bias in AI for fairness and ethics, revealing nuanced stereotyping that may undermine conventional mitigation strategies.

The study investigated stereotyping in Vision-Language Models (VLMs) beyond trait associations, finding that VLMs generate more uniform stories for women than men and for White Americans than Black Americans, with gender prototypicality linked to stronger uniformity but race prototypicality not.

Current research on bias in Vision Language Models (VLMs) has important limitations: it is focused exclusively on trait associations while ignoring other forms of stereotyping, it examines specific contexts where biases are expected to appear, and it conceptualizes social categories like race and gender as binary, ignoring the multifaceted nature of these identities. Using standardized facial images that vary in prototypicality, we test four VLMs for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. By contrast, VLMs represent White Americans more uniformly than Black Americans. Unlike with gender prototypicality, race prototypicality was not related to stronger uniformity. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM. These findings demonstrate that VLM stereotyping manifests in ways that go beyond simple group membership, suggesting that conventional bias mitigation strategies may be insufficient to address VLM stereotyping and that homogeneity bias persists even when trait associations are less apparent in model outputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes