CLAIMar 18

From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation

arXiv:2603.1730357.3h-index: 8
AI Analysis

This addresses the problem of fragmented evaluation for cross-cultural understanding in machine translation, providing a systematic framework for researchers and developers, though it is incremental as it builds on existing benchmarking efforts.

The authors tackled the challenge of machine translation systems accurately translating culture-expressions like idioms and slang by introducing CulT-Eval, a benchmark with over 7,959 instances, and found that current models struggle to preserve cultural meaning and nuances.

Culture-expressions, such as idioms, slang, and culture-specific items (CSIs), are pervasive in natural language and encode meanings that go beyond literal linguistic form. Accurately translating such expressions remains challenging for machine translation systems. Despite this, existing benchmarks remain fragmented and do not provide a systematic framework for evaluating translation performance on culture-loaded expressions. To address this gap, we introduce CulT-Eval, a benchmark designed to evaluate how models handle different types of culturally grounded expressions. CulT-Eval comprises over 7,959 carefully curated instances spanning multiple types of culturally grounded expressions, with a comprehensive error taxonomy covering culturally grounded expressions. Through extensive evaluation of large language models and detailed analysis, we identify recurring and systematic failure modes that are not adequately captured by existing automatic metrics. Accordingly, we propose a complementary evaluation metric that targets culturally induced meaning deviations overlooked by standard MT metrics. The results indicate that current models struggle to preserve culturally grounded meaning and to capture the cultural and contextual nuances essential for accurate translation. Our benchmark and code are available at https://anonymous.4open.science/r/CulT-Eval-E75D/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes