CLLGMar 9, 2025

How a Bilingual LM Becomes Bilingual: Tracing Internal Representations with Sparse Autoencoders

arXiv:2503.06394v26 citationsh-index: 13EMNLP
Originality Incremental advance
AI Analysis

This work provides insights into bilingual acquisition in language models, which is incremental and relevant for NLP researchers.

The study investigated how bilingual language models develop internal representations, finding that they first learn languages separately and then form bilingual alignments, especially in mid layers and larger models, and demonstrated that integrating decomposed representations from a fully trained model into a mid-training model improves performance.

This study explores how bilingual language models develop complex internal representations. We employ sparse autoencoders to analyze internal representations of bilingual language models with a focus on the effects of training steps, layers, and model sizes. Our analysis shows that language models first learn languages separately, and then gradually form bilingual alignments, particularly in the mid layers. We also found that this bilingual tendency is stronger in larger models. Building on these findings, we demonstrate the critical role of bilingual representations in model performance by employing a novel method that integrates decomposed representations from a fully trained model into a mid-training model. Our results provide insights into how language models acquire bilingual capabilities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes