CLApr 19

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

arXiv:2604.1763365.2h-index: 16
AI Analysis

For researchers studying multilingual models, this work provides a detailed temporal understanding of how translation capabilities develop, revealing a two-phase process that was previously unknown.

This paper investigates how cross-lingual generalization emerges during multilingual pretraining by training a 1.7B model on nine languages and analyzing checkpoints at fine granularity. It finds that translation develops in two phases: an initial copying-dominated phase followed by a phase where more generalizing translation mechanisms emerge.

Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges--particularly in the early phases of learning. To study the early trajectory of linguistic and translation capabilities, we pretrain a multilingual 1.7B model on nine diverse languages, capturing checkpoints at a much finer granularity. We further introduce a novel word-level translation dataset and trace how translation develops over training through behavioral analyses, model-component analysis, and parameter-based ablations. We find that the model quickly acquires basic linguistic capabilities in parallel with token-level copying, while translation develops in two distinct phases: an initial phase dominated by copying and surface-level similarities, and a second phase in which more generalizing translation mechanisms are developed while copying is refined. Together, these findings provide a fine-grained view of how cross-lingual generalization develops during multilingual pretraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes