Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads
This work advances understanding of multilingual language models by isolating attention heads responsible for mapping to target languages, which is incremental but specific to multilingual reasoning tasks.
The study investigated attention heads in multilingual language models, identifying Retrieval-Transition heads (RTH) that govern transitions to target-language output and are more vital for Chain-of-Thought reasoning than retrieval heads. Experiments across four benchmarks and two model families showed that masking RTH induces a bigger performance drop than masking retrieval heads.
Recent work has identified a subset of attention heads in Transformer as retrieval heads, which are responsible for retrieving information from the context. In this work, we first investigate retrieval heads in multilingual contexts. In multilingual language models, we find that retrieval heads are often shared across multiple languages. Expanding the study to cross-lingual setting, we identify Retrieval-Transition heads(RTH), which govern the transition to specific target-language output. Our experiments reveal that RTHs are distinct from retrieval heads and more vital for Chain-of-Thought reasoning in multilingual LLMs. Across four multilingual benchmarks (MMLU-ProX, MGSM, MLQA, and XQuaD) and two model families (Qwen-2.5 and Llama-3.1), we demonstrate that masking RTH induces bigger performance drop than masking Retrieval Heads (RH). Our work advances understanding of multilingual LMs by isolating the attention heads responsible for mapping to target languages.