CYAICLSep 14, 2024

Evaluating Cultural Awareness of LLMs for Yoruba, Malayalam, and English

arXiv:2410.01811v19 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited cultural understanding in LLMs for regional languages, which is incremental as it applies existing cultural frameworks to new contexts.

The study assessed the cultural awareness of large language models (LLMs) for Yoruba, Malayalam, and English using Hofstede's six cultural dimensions, finding that while LLMs show high cultural similarity for English, they fail to capture cultural nuances for the regional languages.

Although LLMs have been extremely effective in a large number of complex tasks, their understanding and functionality for regional languages and cultures are not well studied. In this paper, we explore the ability of various LLMs to comprehend the cultural aspects of two regional languages: Malayalam (state of Kerala, India) and Yoruba (West Africa). Using Hofstede's six cultural dimensions: Power Distance (PDI), Individualism (IDV), Motivation towards Achievement and Success (MAS), Uncertainty Avoidance (UAV), Long Term Orientation (LTO), and Indulgence (IVR), we quantify the cultural awareness of LLM-based responses. We demonstrate that although LLMs show a high cultural similarity for English, they fail to capture the cultural nuances across these 6 metrics for Malayalam and Yoruba. We also highlight the need for large-scale regional language LLM training with culturally enriched datasets. This will have huge implications for enhancing the user experience of chat-based LLMs and also improving the validity of large-scale LLM agent-based market research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes