CL CVOct 20, 2024

CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts

Malvina Nikandrou, Georgios Pantazopoulos, Nikolas Vitsakis, Ioannis Konstas, Alessandro Suglia

arXiv:2410.15453v212.218 citationsh-index: 11Has CodeNAACL

Originality Incremental advance

AI Analysis

This work addresses the problem of cultural inclusivity in AI for global users by evaluating and highlighting limitations in current models, representing an incremental step in benchmarking cultural understanding.

The paper introduced CROPE, a visual question answering benchmark to assess vision and language models' knowledge of culture-specific concepts and their ability to adapt using contextual information, revealing large performance disparities and struggles in utilizing multimodal cues for cultural adaptation.

As Vision and Language models (VLMs) are reaching users across the globe, assessing their cultural understanding has become a critical challenge. In this paper, we introduce CROPE, a visual question answering benchmark designed to probe the knowledge of culture-specific concepts and evaluate the capacity for cultural adaptation through contextual information. This allows us to distinguish between parametric knowledge acquired during training and contextual knowledge provided during inference via visual and textual descriptions. Our evaluation of several state-of-the-art open VLMs shows large performance disparities between culture-specific and common concepts in the parametric setting. Moreover, experiments with contextual knowledge indicate that models struggle to effectively utilize multimodal information and bind culture-specific concepts to their depictions. Our findings reveal limitations in the cultural understanding and adaptability of current VLMs that need to be addressed toward more culturally inclusive models.

View on arXiv PDF Code

Similar