CLMar 27

Toward Culturally Grounded Natural Language Processing

arXiv:2603.2601386.3h-index: 3

Predicted impact top 47% in CL · last 90 daysOriginality Highly original

AI Analysis

This addresses the problem of cultural bias and inclusivity in NLP for global users, representing a foundational critique rather than an incremental improvement.

The paper synthesizes over 50 studies to highlight that multilingual NLP models often lack cultural competence, showing that strong multilingual performance can still lead to misreading culturally grounded cues and underperformance in lower-resource settings, and it proposes a research agenda to shift toward modeling communicative ecologies for culturally grounded NLP.

Recent progress in multilingual NLP is often taken as evidence of broader global inclusivity, but a growing literature shows that multilingual capability and cultural competence come apart. This paper synthesizes over 50 papers from 2020--2026 spanning multilingual performance inequality, cross-lingual transfer, culture-aware evaluation, cultural alignment, multimodal local-knowledge modeling, benchmark design critiques, and community-grounded data practices. Across this literature, training data coverage remains a strong determinant of performance, yet it is not sufficient: tokenization, prompt language, translated benchmark design, culturally specific supervision, and multimodal context all materially affect outcomes. Recent work on Global-MMLU, CDEval, WorldValuesBench, CulturalBench, CULEMO, CulturalVQA, GIMMICK, DRISHTIKON, WorldCuisines, CARE, CLCA, and newer critiques of benchmark design and community-grounded evaluation shows that strong multilingual models can still flatten local norms, misread culturally grounded cues, and underperform in lower-resource or community-specific settings. We argue that the field should move from treating languages as isolated rows in a benchmark spreadsheet toward modeling communicative ecologies: the institutions, scripts, translation pipelines, domains, modalities, and communities through which language is used. On that basis, we propose a research agenda for culturally grounded NLP centered on richer contextual metadata, culturally stratified evaluation, participatory alignment, within-language variation, and multimodal community-aware design.

View on arXiv PDF

Similar