CV CLMay 28, 2025

Evaluation of Cultural Competence of Vision-Language Models

Srishti Yadav, Lauren Tilton, Maria Antoniak, Taylor Arnold, Jiaang Li, Siddhesh Milind Pawar, Antonia Karamolegkou, Stella Frank, Zhaochong An, Negar Rostamzadeh, Daniel Hershcovich, Serge Belongie

arXiv:2505.22793v22 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This addresses the issue of cultural bias in VLMs for diverse applications, but it is incremental as it builds on existing studies.

The paper tackles the problem of vision-language models lacking cultural competency by proposing that foundational methodologies from visual culture studies are necessary for cultural analysis of images, resulting in a set of five frameworks corresponding to cultural dimensions for more complete evaluation.

Modern vision-language models (VLMs) often fail at cultural competency evaluations and benchmarks. Given the diversity of applications built upon VLMs, there is renewed interest in understanding how they encode cultural nuances. While individual aspects of this problem have been studied, we still lack a comprehensive framework for systematically identifying and annotating the nuanced cultural dimensions present in images for VLMs. This position paper argues that foundational methodologies from visual culture studies (cultural studies, semiotics, and visual studies) are necessary for cultural analysis of images. Building upon this review, we propose a set of five frameworks, corresponding to cultural dimensions, that must be considered for a more complete analysis of the cultural competencies of VLMs.

View on arXiv PDF

Similar