CLMar 21, 2025

When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts

arXiv:2503.16826v15 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses cultural bias in AI models, which is a problem for global users and fairness, but it is incremental as it builds on existing bias evaluation methods.

The paper tackles the problem of cultural bias in multimodal large language models (MLLMs) by introducing MixCuBe, a cross-cultural bias benchmark, and finds that MLLMs show higher accuracy and lower sensitivity to perturbations for high-resource cultures but not for low-resource ones, with GPT-4o exhibiting up to a 58% accuracy drop in low-resource cultural settings.

In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs. For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it. However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MixCuBe, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures. Our dataset is publicly available at: https://huggingface.co/datasets/kyawyethu/MixCuBe.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes