CLOct 7, 2025

The fragility of "cultural tendencies" in LLMs

arXiv:2510.05869v1h-index: 5
Originality Synthesis-oriented
AI Analysis

This challenges claims about LLMs encoding cultural beliefs, which is important for researchers and practitioners in AI and linguistics, but is incremental as it re-evaluates existing work.

The paper critiques a prior study claiming LLMs show cultural tendencies based on prompt language, finding instead that these effects are fragile artifacts of specific models and tasks, with prompt language having minimal impact on outputs.

In a recent study, Lu, Song, and Zhang (2025) (LSZ) propose that large language models (LLMs), when prompted in different languages, display culturally specific tendencies. They report that the two models (i.e., GPT and ERNIE) respond in more interdependent and holistic ways when prompted in Chinese, and more independent and analytic ways when prompted in English. LSZ attribute these differences to deep-seated cultural patterns in the models, claiming that prompt language alone can induce substantial cultural shifts. While we acknowledge the empirical patterns they observed, we find their experiments, methods, and interpretations problematic. In this paper, we critically re-evaluate the methodology, theoretical framing, and conclusions of LSZ. We argue that the reported "cultural tendencies" are not stable traits but fragile artifacts of specific models and task design. To test this, we conducted targeted replications using a broader set of LLMs and a larger number of test items. Our results show that prompt language has minimal effect on outputs, challenging LSZ's claim that these models encode grounded cultural beliefs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes