CLJul 13, 2025

MCEval: A Dynamic Framework for Fair Multilingual Cultural Evaluation of LLMs

arXiv:2507.09701v19.64 citationsh-index: 20IEEE Transactions on Audio, Speech, and Language Processing

Originality Incremental advance

AI Analysis

This addresses cultural fairness issues in LLMs for diverse global users, representing a novel domain-specific evaluation framework.

The authors tackled the problem of cultural biases and limited cross-cultural understanding in large language models by introducing MCEval, a dynamic multilingual evaluation framework, which revealed performance disparities across 13 cultures and languages, showing that optimal cultural performance depends on language-culture alignment and exposing fairness issues in English-centric approaches.

Large language models exhibit cultural biases and limited cross-cultural understanding capabilities, particularly when serving diverse global user populations. We propose MCEval, a novel multilingual evaluation framework that employs dynamic cultural question construction and enables causal analysis through Counterfactual Rephrasing and Confounder Rephrasing. Our comprehensive evaluation spans 13 cultures and 13 languages, systematically assessing both cultural awareness and cultural bias across different linguistic scenarios. The framework provides 39,897 cultural awareness instances and 17,940 cultural bias instances. Experimental results reveal performance disparities across different linguistic scenarios, demonstrating that optimal cultural performance is not only linked to training data distribution, but also is related to language-culture alignment. The evaluation results also expose the fairness issue, where approaches appearing successful in the English scenario create substantial disadvantages. MCEval represents the first comprehensive multilingual cultural evaluation framework that provides deeper insights into LLMs' cultural understanding.

View on arXiv PDF

Similar