CLFeb 28, 2025

Capability Localization: Capabilities Can be Localized rather than Individual Knowledge

Xiusheng Huang, Jiaxiang Liu, Yequan Wang, Jun Zhao, Kang Liu

arXiv:2502.20992v11 citationsh-index: 28Has CodeICLR

Originality Incremental advance

AI Analysis

This work addresses the interpretability of language models for researchers, providing a novel approach to localize capabilities rather than individual knowledge, which is incremental but offers specific insights.

The paper tackles the problem of understanding how model parameters affect performance in large language models by showing that individual knowledge cannot be localized, but commonalities can be localized using a new method, achieving a 96.42% neuron overlap rate on the GSM8K dataset.

Large scale language models have achieved superior performance in tasks related to natural language processing, however, it is still unclear how model parameters affect performance improvement. Previous studies assumed that individual knowledge is stored in local parameters, and the storage form of individual knowledge is dispersed parameters, parameter layers, or parameter chains, which are not unified. We found through fidelity and reliability evaluation experiments that individual knowledge cannot be localized. Afterwards, we constructed a dataset for decoupling experiments and discovered the potential for localizing data commonalities. To further reveal this phenomenon, this paper proposes a Commonality Neuron Localization (CNL) method, which successfully locates commonality neurons and achieves a neuron overlap rate of 96.42% on the GSM8K dataset. Finally, we have demonstrated through cross data experiments that commonality neurons are a collection of capability neurons that possess the capability to enhance performance. Our code is available at https://github.com/nlpkeg/Capability-Neuron-Localization.

View on arXiv PDF Code

Similar