Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs
This addresses the issue of cultural bias in AI for users relying on LLMs for culturally sensitive applications, though it is incremental as it builds on existing critiques of multilingual capabilities.
The paper tackled the problem of whether large language models (LLMs) conduct culture-aware reasoning by auditing their cultural inclusivity in a creative writing task, finding that models exhibit stereotyped metaphor usage and Western defaultism across five cultural settings.
Large language models (LLMs) are often described as multilingual because they can understand and respond in many languages. However, speaking a language is not the same as reasoning within a culture. This distinction motivates a critical question: do LLMs truly conduct culture-aware reasoning? This paper presents a preliminary computational audit of cultural inclusivity in a creative writing task. We empirically examine whether LLMs act as culturally diverse creative partners or merely as cultural translators that leverage a dominant conceptual framework with localized expressions. Using a metaphor generation task spanning five cultural settings and several abstract concepts as a case study, we find that the model exhibits stereotyped metaphor usage for certain settings, as well as Western defaultism. These findings suggest that merely prompting an LLM with a cultural identity does not guarantee culturally grounded reasoning.