CV AI CLSep 27, 2025

Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning

Haorui Yu, Yang Zhao, Yijia Chu, Qiufeng Yi

arXiv:2509.23311v210.23 citationsh-index: 2Proceedings of the 9th Widening NLP Workshop

Originality Incremental advance

AI Analysis

This work addresses the problem of cultural bias in vision-language models for researchers and developers, highlighting risks in multimodal systems, though it is incremental as it focuses on a specific diagnostic approach.

The study introduced a diagnostic framework to assess vision-language models' reasoning on fire-themed cultural imagery, revealing systematic biases where models correctly identify Western festivals but struggle with underrepresented cultural events and dangerously misclassify emergencies as celebrations.

Vision-Language Models (VLMs) often appear culturally competent but rely on superficial pattern matching rather than genuine cultural understanding. We introduce a diagnostic framework to probe VLM reasoning on fire-themed cultural imagery through both classification and explanation analysis. Testing multiple models on Western festivals, non-Western traditions, and emergency scenes reveals systematic biases: models correctly identify prominent Western festivals but struggle with underrepresented cultural events, frequently offering vague labels or dangerously misclassifying emergencies as celebrations. These failures expose the risks of symbolic shortcuts and highlight the need for cultural evaluation beyond accuracy metrics to ensure interpretable and fair multimodal systems.

View on arXiv PDF

Similar