CL LGOct 19, 2020

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Jonas Wallat, Jaspreet Singh, Avishek Anand

arXiv:2010.09313v23.666 citationsHas Code

Originality Incremental advance

AI Analysis

This provides insights into knowledge representation in BERT for NLP researchers, though it is incremental as it builds on existing probing methods.

The paper investigates how BERT captures and forgets relational knowledge across its layers, finding that intermediate layers contribute 17-60% of the total knowledge, and fine-tuning causes forgetting influenced by the objective but not dataset size, with ranking models retaining the most.

Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.

View on arXiv PDF Code

Similar