Isolating Culture Neurons in Multilingual Large Language Models
This work addresses the challenge of understanding and potentially editing cultural biases in AI models for fairness and inclusivity, though it builds incrementally on existing neuron localization methods.
The researchers tackled the problem of identifying how multilingual large language models encode culture by localizing and isolating culture-specific neurons, using a curated dataset of 85.2 million tokens across six cultures. They found that LLMs encode different cultures in distinct neuron populations, primarily in upper layers, and these can be modulated independently of language-specific neurons.
Language and culture are deeply intertwined, yet it has been unclear how and where multilingual large language models encode culture. Here, we build on an established methodology for identifying language-specific neurons to localize and isolate culture-specific neurons, carefully disentangling their overlap and interaction with language-specific neurons. To facilitate our experiments, we introduce MUREL, a curated dataset of 85.2 million tokens spanning six different cultures. Our localization and intervention experiments show that LLMs encode different cultures in distinct neuron populations, predominantly in upper layers, and that these culture neurons can be modulated largely independently of language-specific neurons or those specific to other cultures. These findings suggest that cultural knowledge and propensities in multilingual language models can be selectively isolated and edited, with implications for fairness, inclusivity, and alignment. Code and data are available at https://github.com/namazifard/Culture_Neurons.