CRANE: Causal Relevance Analysis of Language-Specific Neurons in Multilingual Large Language Models
This work addresses the need for better interpretability in multilingual LLMs, though it is incremental as it builds on prior neuron analysis methods.
The authors tackled the problem of understanding how language capabilities are organized at the neuron level in multilingual large language models, proposing CRANE, a relevance-based analysis framework that identifies language-specific neurons through targeted interventions, showing it isolates language-specific components more precisely than activation-based methods.
Multilingual large language models (LLMs) achieve strong performance across languages, yet how language capabilities are organized at the neuron level remains poorly understood. Prior work has identified language-related neurons mainly through activation-based heuristics, which conflate language preference with functional importance. Prior work has identified language-related neurons mainly through activation-based heuristics, which conflate language preference with functional importance. We propose CRANE, a relevance-based analysis framework that redefines language specificity in terms of functional necessity, identifying language-specific neurons through targeted neuron-level interventions. CRANE characterizes neuron specialization by their contribution to language-conditioned predictions rather than activation magnitude. Our implementation will be made publicly available. Neuron-level interventions reveal a consistent asymmetric pattern: masking neurons relevant to a target language selectively degrades performance on that language while preserving performance on other languages to a substantial extent, indicating language-selective but non-exclusive neuron specializations. Experiments on English, Chinese, and Vietnamese across multiple benchmarks, together with a dedicated relevance-based metric and base-to-chat model transfer analysis, show that CRANE isolates language-specific components more precisely than activation-based methods.