CLApr 3, 2024

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Takeshi Kojima, Itsuki Okimura, Yusuke Iwasawa, Hitomi Yanaka, Yutaka Matsuo

arXiv:2404.02431v126.489 citationsh-index: 20Has CodeNAACL

Originality Incremental advance

AI Analysis

This work provides insights into the internal mechanisms of multilingual models, which could aid in controlling language outputs, though it is incremental as it builds on existing model analysis.

The study investigated how decoder-based multilingual pre-trained language models handle different languages by identifying language-specific neurons, finding that less than 5% of neurons overlap between languages and that tampering with under 1% of these neurons significantly alters language generation probabilities.

Current decoder-based pre-trained language models (PLMs) successfully demonstrate multilingual capabilities. However, it is unclear how these models handle multilingualism. We analyze the neuron-level internal behavior of multilingual decoder-based PLMs, Specifically examining the existence of neurons that fire ``uniquely for each language'' within decoder-only multilingual PLMs. We analyze six languages: English, German, French, Spanish, Chinese, and Japanese, and show that language-specific neurons are unique, with a slight overlap (< 5%) between languages. These neurons are mainly distributed in the models' first and last few layers. This trend remains consistent across languages and models. Additionally, we tamper with less than 1% of the total neurons in each model during inference and demonstrate that tampering with a few language-specific neurons drastically changes the probability of target language occurrence in text generation.

View on arXiv PDF Code

Similar