CLApr 18, 2021

Knowledge Neurons in Pretrained Transformers

Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei

arXiv:2104.08696v236.1837 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the interpretability of large language models for researchers, though it is incremental as it builds on existing attribution methods.

The paper tackles the problem of understanding how factual knowledge is stored in pretrained Transformers by introducing knowledge neurons, showing that their activation correlates with fact expression and enabling editing of specific facts without fine-tuning.

Large-scale pretrained language models are surprisingly good at recalling factual knowledge presented in the training corpus. In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons. Specifically, we examine the fill-in-the-blank cloze task for BERT. Given a relational fact, we propose a knowledge attribution method to identify the neurons that express the fact. We find that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts. In our case studies, we attempt to leverage knowledge neurons to edit (such as update, and erase) specific factual knowledge without fine-tuning. Our results shed light on understanding the storage of knowledge within pretrained Transformers. The code is available at https://github.com/Hunter-DDM/knowledge-neurons.

View on arXiv PDF Code

Similar