CLApr 3, 2024

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

arXiv:2404.02431v188 citationsh-index: 20NAACL
AI Analysis

This work provides insights into the internal mechanisms of multilingual models, which could aid in controlling language outputs, though it is incremental as it builds on existing model analysis.

The study investigated how decoder-based multilingual pre-trained language models handle different languages by identifying language-specific neurons, finding that less than 5% of neurons overlap between languages and that tampering with under 1% of these neurons significantly alters language generation probabilities.

Current decoder-based pre-trained language models (PLMs) successfully demonstrate multilingual capabilities. However, it is unclear how these models handle multilingualism. We analyze the neuron-level internal behavior of multilingual decoder-based PLMs, Specifically examining the existence of neurons that fire ``uniquely for each language'' within decoder-only multilingual PLMs. We analyze six languages: English, German, French, Spanish, Chinese, and Japanese, and show that language-specific neurons are unique, with a slight overlap (< 5%) between languages. These neurons are mainly distributed in the models' first and last few layers. This trend remains consistent across languages and models. Additionally, we tamper with less than 1% of the total neurons in each model during inference and demonstrate that tampering with a few language-specific neurons drastically changes the probability of target language occurrence in text generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes