CLAILGSep 21, 2025

The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMs

arXiv:2509.17030v14 citationsh-index: 1EMNLP
Originality Incremental advance
AI Analysis

This work addresses a fundamental mechanism in multilingual AI processing, offering insights into neural network interpretability for researchers and practitioners, though it is incremental as it builds on existing frameworks.

The paper tackles the underexplored internal dynamics of how multilingual LLMs transform language-specific representations, proposing and validating the Transfer Neurons Hypothesis that identifies neurons responsible for transferring between language-specific and shared semantic latent spaces, and showing these neurons are critical for reasoning.

Recent studies have suggested a processing framework for multilingual inputs in decoder-based LLMs: early layers convert inputs into English-centric and language-agnostic representations; middle layers perform reasoning within an English-centric latent space; and final layers generate outputs by transforming these representations back into language-specific latent spaces. However, the internal dynamics of such transformation and the underlying mechanism remain underexplored. Towards a deeper understanding of this framework, we propose and empirically validate The Transfer Neurons Hypothesis: certain neurons in the MLP module are responsible for transferring representations between language-specific latent spaces and a shared semantic latent space. Furthermore, we show that one function of language-specific neurons, as identified in recent studies, is to facilitate movement between latent spaces. Finally, we show that transfer neurons are critical for reasoning in multilingual LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes