CVDec 12, 2024
Vision-Language Models Generate More Homogeneous Stories for Phenotypically Black IndividualsMessi H. J. Lee, Soyeon Jeon
Vision-Language Models (VLMs) extend Large Language Models' capabilities by integrating image processing, but concerns persist about their potential to reproduce and amplify human biases. While research has documented how these models perpetuate stereotypes across demographic groups, most work has focused on between-group biases rather than within-group differences. This study investigates homogeneity bias-the tendency to portray groups as more uniform than they are-within Black Americans, examining how perceived racial phenotypicality influences VLMs' outputs. Using computer-generated images that systematically vary in phenotypicality, we prompted VLMs to generate stories about these individuals and measured text similarity to assess content homogeneity. Our findings reveal three key patterns: First, VLMs generate significantly more homogeneous stories about Black individuals with higher phenotypicality compared to those with lower phenotypicality. Second, stories about Black women consistently display greater homogeneity than those about Black men across all models tested. Third, in two of three VLMs, this homogeneity bias is primarily driven by a pronounced interaction where phenotypicality strongly influences content variation for Black women but has minimal impact for Black men. These results demonstrate how intersectionality shapes AI-generated representations and highlight the persistence of stereotyping that mirror documented biases in human perception, where increased racial phenotypicality leads to greater stereotyping and less individualized representation.
CVMar 7, 2025
Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language ModelsMessi H. J. Lee, Soyeon Jeon, Jacob M. Montgomery et al.
Current research on bias in Vision Language Models (VLMs) has important limitations: it is focused exclusively on trait associations while ignoring other forms of stereotyping, it examines specific contexts where biases are expected to appear, and it conceptualizes social categories like race and gender as binary, ignoring the multifaceted nature of these identities. Using standardized facial images that vary in prototypicality, we test four VLMs for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. By contrast, VLMs represent White Americans more uniformly than Black Americans. Unlike with gender prototypicality, race prototypicality was not related to stronger uniformity. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM. These findings demonstrate that VLM stereotyping manifests in ways that go beyond simple group membership, suggesting that conventional bias mitigation strategies may be insufficient to address VLM stereotyping and that homogeneity bias persists even when trait associations are less apparent in model outputs.
CLJan 31, 2025
Token Sampling Uncertainty Does Not Explain Homogeneity Bias in Large Language ModelsMessi H. J. Lee, Soyeon Jeon
Homogeneity bias is one form of stereotyping in AI models where certain groups are represented as more similar to each other than other groups. This bias is a major obstacle to creating equitable language technologies. We test whether the bias is driven by systematic differences in token-sampling uncertainty across six large language models. While we observe the presence of homogeneity bias using sentence similarity, we find very little difference in token sampling uncertainty across groups. This finding elucidates why temperature-based sampling adjustments fail to mitigate homogeneity bias. It suggests researchers should prioritize interventions targeting representation learning mechanisms and training corpus composition rather than inference-time output manipulations.