CLAIDBIRLGJan 15, 2022

Kformer: Knowledge Injection in Transformer Feed-Forward Layers

arXiv:2201.05742v251 citationsHas Code
AI Analysis

This work addresses the challenge of enhancing knowledge-intensive tasks for NLP applications, though it is incremental as it builds on existing knowledge injection techniques.

The authors tackled the problem of injecting external knowledge into pre-trained language models by proposing Kformer, a simple model that leverages both internal knowledge stored in feed-forward layers and external knowledge, achieving better performance on commonsense reasoning and medical question answering tasks compared to other injection methods.

Recent days have witnessed a diverse set of knowledge injection models for pre-trained language models (PTMs); however, most previous studies neglect the PTMs' own ability with quantities of implicit knowledge stored in parameters. A recent study has observed knowledge neurons in the Feed Forward Network (FFN), which are responsible for expressing factual knowledge. In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers. Empirically results on two knowledge-intensive tasks, commonsense reasoning (i.e., SocialIQA) and medical question answering (i.e., MedQA-USMLE), demonstrate that Kformer can yield better performance than other knowledge injection technologies such as concatenation or attention-based injection. We think the proposed simple model and empirical findings may be helpful for the community to develop more powerful knowledge injection methods. Code available in https://github.com/zjunlp/Kformer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes