CLAIJan 7, 2025

SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

arXiv:2501.03681v130 citationsh-index: 10COLING
Originality Incremental advance
AI Analysis

This addresses the computational inefficiency and catastrophic forgetting in multilingual reasoning for AI systems, though it is incremental as it builds on existing fine-tuning paradigms.

The paper tackles the problem of inefficient multilingual reasoning in large language models by proposing SLAM, an approach that selectively fine-tunes only 6 layers (6.5-8% of parameters) in 7B and 13B models, achieving superior average performance across 10 languages and reducing training time by 4.1-11.9 times compared to two-stage methods.

Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training paradigm to teach models to first understand non-English questions and then reason. However, this method suffers from both substantial computational resource computing and catastrophic forgetting. The fundamental cause is that, with the primary goal of enhancing multilingual comprehension, an excessive number of irrelevant layers and parameters are tuned during the first stage. Given our findings that the representation learning of languages is merely conducted in lower-level layers, we propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism. Experimental results show that our method, SLAM, only tunes 6 layers' feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs, achieving superior average performance than all strong baselines across 10 languages. Meanwhile, SLAM only involves one training stage, reducing training time by 4.1-11.9 compared to the two-stage method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes