CLMay 25

On the Limits of Model Merging for Multilinguality in Pre-Training

arXiv:2605.2584657.3
AI Analysis

For researchers in multilingual NLP, this paper shows that model merging, effective in fine-tuning, does not trivially extend to pre-training, highlighting a fundamental limitation.

The study tests whether merging monolingually pre-trained models can achieve multilingual performance, finding that merging leads to performance collapse due to interference, and that representational similarity is a prerequisite for successful merging.

Endowing models with consistent multilingual performance can be achieved by mixing pre-training data, or post-training approaches such as language-specific model merging. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes