CLFeb 22, 2024

Unveiling Linguistic Regions in Large Language Models

arXiv:2402.14700v346 citationsh-index: 40ACL
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding how LLMs achieve cross-lingual alignment, which is incremental as it builds on existing research by focusing on internal mechanisms rather than improving generalization capabilities.

The paper investigates the intrinsic mechanisms of cross-lingual alignment in Large Language Models (LLMs) by identifying a core linguistic region comprising about 1% of parameters, whose removal causes significant performance drops across 30 languages, and shows that freezing this region during further pre-training can mitigate catastrophic forgetting.

Large Language Models (LLMs) have demonstrated considerable cross-lingual alignment and generalization ability. Current research primarily focuses on improving LLMs' cross-lingual generalization capabilities. However, there is still a lack of research on the intrinsic mechanisms of how LLMs achieve cross-lingual alignment. From the perspective of region partitioning, this paper conducts several investigations on the linguistic competence of LLMs. We discover a core region in LLMs that corresponds to linguistic competence, accounting for approximately 1% of the total model parameters. Removing this core region by setting parameters to zero results in a significant performance decrease across 30 different languages. Furthermore, this core region exhibits significant dimensional dependence, perturbations to even a single parameter on specific dimensions leading to a loss of linguistic competence. Moreover, we discover that distinct monolingual regions exist for different languages, and disruption to these specific regions substantially reduces the LLMs' proficiency in those corresponding languages. Our research also indicates that freezing the core linguistic region during further pre-training can mitigate the issue of catastrophic forgetting (CF), a common phenomenon observed during further pre-training of LLMs. Overall, exploring the LLMs' functional regions provides insights into the foundation of their intelligence.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes