Language Modeling with Editable External Knowledge
This addresses the challenge of keeping language models current with changing world information, though it is incremental as it builds on retrieval-augmented generation.
The paper tackles the problem of updating language models with new external knowledge by introducing ERASE, a method that incrementally edits the knowledge base when new documents are added, resulting in accuracy improvements of 7-13% and 6-10% on benchmark datasets.
When the world changes, so does the text that humans write about it. How do we build language models that can be easily updated to reflect these changes? One popular approach is retrieval-augmented generation, in which new documents are inserted into a knowledge base and retrieved during prediction for downstream tasks. Most prior work on these systems have focused on improving behavior during prediction through better retrieval or reasoning. This paper introduces ERASE, which instead improves model behavior when new documents are acquired, by incrementally deleting or rewriting other entries in the knowledge base each time a document is added. In two new benchmark datasets evaluating models' ability to answer questions about a stream of news articles or conversations, ERASE improves accuracy relative to conventional retrieval-augmented generation by 7-13% (Mixtral-8x7B) and 6-10% (Llama-3-8B) absolute. Code and data are available at https://github.com/belindal/ERASE