CLAIJun 14, 2025

Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

arXiv:2506.12388v14 citationsh-index: 17ACL
Originality Incremental advance
AI Analysis

This addresses a fundamental problem in multilingual AI for applications requiring efficient language processing across many languages, though it appears incremental as it builds on existing mixture-of-experts and grouping techniques.

The paper tackles the curse of multilinguality in multilingual LLMs, where language competition leads to poor performance, by proposing a method to dynamically group similar languages and scale parameters via mixture-of-experts layers. Experimental results on 18 to 128 languages show the method reduces negative transfer and significantly boosts multilingual performance with fewer parameters.

The curse of multilinguality phenomenon is a fundamental problem of multilingual Large Language Models (LLMs), where the competition between massive languages results in inferior performance. It mainly comes from limited capacity and negative transfer between dissimilar languages. To address this issue, we propose a method to dynamically group and scale up the parameters of multilingual LLM while boosting positive transfer among similar languages. Specifically, the model is first tuned on monolingual corpus to determine the parameter deviation in each layer and quantify the similarity between languages. Layers with more deviations are extended to mixture-of-experts layers to reduce competition between languages, where one expert module serves one group of similar languages. Experimental results on 18 to 128 languages show that our method reduces the negative transfer between languages and significantly boosts multilingual performance with fewer parameters. Such language group specialization on experts benefits the new language adaptation and reduces the inference on the previous multilingual knowledge learned.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes