Multilingual Language Models are not Multicultural: A Case Study in Emotion
This highlights a critical limitation for using multilingual LMs in tasks requiring emotional sensitivity across diverse cultures, indicating an incremental step in addressing cultural bias in AI.
The study investigated whether multilingual language models reflect cultural variations in emotion, finding that models like XLM-RoBERTa and ChatGPT are Anglocentric and adhere to Western norms, failing to capture culturally appropriate emotional nuances.
Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this.