CLJan 10, 2025

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

arXiv:2501.06346v238 citationsh-index: 8NAACL
AI Analysis

This addresses the problem of understanding cross-lingual representation learning in LLMs for AI researchers, but it is incremental as it builds on existing work with sparse autoencoders.

The study investigated whether large language models (LLMs) share representations of grammatical concepts like number and tense across languages, finding that abstract concepts are encoded in feature directions shared across many languages, with causal interventions showing that ablating multilingual features reduces classifier performance to near-chance levels.

Human bilinguals often use similar brain regions to process multiple languages, depending on when they learned their second language and their proficiency. In large language models (LLMs), how are multiple languages learned and encoded? In this work, we explore the extent to which LLMs share representations of morphsyntactic concepts such as grammatical number, gender, and tense across languages. We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages. We use causal interventions to verify the multilingual nature of these representations; specifically, we show that ablating only multilingual features decreases classifier performance to near-chance across languages. We then use these features to precisely modify model behavior in a machine translation task; this demonstrates both the generality and selectivity of these feature's roles in the network. Our findings suggest that even models trained predominantly on English data can develop robust, cross-lingual abstractions of morphosyntactic concepts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes