CLAICYSOC-PHMay 25, 2025

Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment

arXiv:2505.21548v22 citationsh-index: 10
Originality Incremental advance
AI Analysis

This highlights a critical gap in cultural alignment for LLMs used worldwide, showing that current regional models are incremental and lack true sovereignty.

The study evaluated whether regional LLMs for India align with local cultural values and practices, finding that they perform no better than global models and even a U.S. respondent is a closer proxy for Indian values, with prompting and fine-tuning failing to improve alignment.

Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies. Many countries are now building ``regional'' LLMs, but it remains unclear whether they reflect local values and practices or merely speak local languages. Using India as a case study, we evaluate six Indic and six global LLMs on two dimensions -- values and practices -- grounded in nationally representative surveys and community-sourced QA datasets. Across tasks, Indic models do not align better with Indian norms than global models; in fact, a U.S. respondent is a closer proxy for Indian values than any Indic model. Prompting and regional fine-tuning fail to recover alignment and can even degrade existing knowledge. We attribute this to scarce culturally grounded data, especially for pretraining. We position cultural evaluation as a first-class requirement alongside multilingual benchmarks and offer a reusable, community-grounded methodology. We call for native, community-authored corpora and thick x wide evaluations to build truly sovereign LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes