KILM: Knowledge Injection into Encoder-Decoder Language Models
This addresses the issue of knowledge retention and hallucination in language models for applications requiring accurate entity handling, though it is incremental as it builds on existing encoder-decoder architectures without major modifications.
The paper tackled the problem of enhancing implicit knowledge in pre-trained language models by proposing KILM, a method for injecting entity-related knowledge via generative knowledge infilling during continued pre-training, resulting in models that retain more knowledge, hallucinate less, and achieve improved zero-shot performance on tasks like entity disambiguation, outperforming state-of-the-art models with 30x more parameters.
Large pre-trained language models (PLMs) have been shown to retain implicit knowledge within their parameters. To enhance this implicit knowledge, we propose Knowledge Injection into Language Models (KILM), a novel approach that injects entity-related knowledge into encoder-decoder PLMs, via a generative knowledge infilling objective through continued pre-training. This is done without architectural modifications to the PLMs or adding additional parameters. Experimental results over a suite of knowledge-intensive tasks spanning numerous datasets show that KILM enables models to retain more knowledge and hallucinate less, while preserving their original performance on general NLU and NLG tasks. KILM also demonstrates improved zero-shot performances on tasks such as entity disambiguation, outperforming state-of-the-art models having 30x more parameters.