CVMay 20, 2025

Uncovering Cultural Representation Disparities in Vision-Language Models

Ram Mohan Rao Kadiyala, Siddhant Gupta, Jebish Purbey, Srishti Yadav, Suman Debnath, Alejandro Salamanca, Desmond Elliott

arXiv:2505.14729v32 citationsh-index: 4IJCNLP-AACL

Originality Synthesis-oriented

AI Analysis

It addresses biases in VLMs that could affect fairness in global applications, though it is incremental as it focuses on evaluating existing models rather than proposing new solutions.

This work investigated cultural biases in Vision-Language Models (VLMs) by evaluating their performance on a country identification task using the Country211 dataset, revealing significant variations in accuracy across countries and prompting strategies.

Vision-Language Models (VLMs) have demonstrated impressive capabilities across a range of tasks, yet concerns about their potential biases exist. This work investigates the extent to which prominent VLMs exhibit cultural biases by evaluating their performance on an image-based country identification task at a country level. Utilizing the geographically diverse Country211 dataset, we probe several large vision language models (VLMs) under various prompting strategies: open-ended questions, multiple-choice questions (MCQs) including challenging setups like multilingual and adversarial settings. Our analysis aims to uncover disparities in model accuracy across different countries and question formats, providing insights into how training data distribution and evaluation methodologies might influence cultural biases in VLMs. The findings highlight significant variations in performance, suggesting that while VLMs possess considerable visual understanding, they inherit biases from their pre-training data and scale that impact their ability to generalize uniformly across diverse global contexts.

View on arXiv PDF

Similar