CLJan 30

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training

Felicia Körner, Max Müller-Eberstein, Anna Korhonen, Barbara Plank

arXiv:2601.22851v11.62 citationsh-index: 5

Originality Incremental advance

AI Analysis

This provides insights into cross-lingual alignment dynamics for researchers in multilingual NLP, though it is incremental in analyzing training processes.

The study investigated the emergence and quality of shared concept spaces during multilingual language model training, finding that these spaces develop early and refine over time, but alignment is language-dependent, and some translation gains reflect behavioral shifts rather than improved ability.

Training Large Language Models (LLMs) with high multilingual coverage is becoming increasingly important -- especially when monolingual resources are scarce. Recent studies have found that LLMs process multilingual inputs in shared concept spaces, thought to support generalization and cross-lingual transfer. However, these prior studies often do not use causal methods, lack deeper error analysis or focus on the final model only, leaving open how these spaces emerge during training. We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM through the causal interpretability method of activation patching. We isolate cross-lingual concept representations, then inject them into a translation prompt to investigate how consistently translations can be altered, independently of the language. We find that shared concept spaces emerge early} and continue to refine, but that alignment with them is language-dependent}. Furthermore, in contrast to prior work, our fine-grained manual analysis reveals that some apparent gains in translation quality reflect shifts in behavior -- like selecting senses for polysemous words or translating instead of copying cross-lingual homographs -- rather than improved translation ability. Our findings offer new insight into the training dynamics of cross-lingual alignment and the conditions under which causal interpretability methods offer meaningful insights in multilingual contexts.

View on arXiv PDF

Similar