Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models
This work addresses efficiency and accuracy issues in retrieval-augmented generation for NLP practitioners, though it is incremental as it builds on prior adaptive retrieval methods.
The paper tackles the problem of retrieval not always being helpful for large language models when they already know the answer, by proposing an adaptive retrieval method that uses pre-trained token embeddings to decide when to retrieve, achieving superior performance across benchmarks.
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. However, it was observed by previous works that retrieval is not always helpful, especially when the LLM is already knowledgeable on the query to answer. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM. Previous works of ARAG either require accessing the pre-training corpus or prompting with additional model inferences. Aiming to avoid such drawbacks, we propose to determine whether the model is knowledgeable on a query via inspecting the (contextualized) pre-trained token embeddings of LLMs. We hypothesize that such embeddings capture rich information on the model's intrinsic knowledge base, which enables an efficient way of judging the necessity to retrieve from an external corpus. Extensive experiments demonstrate our ARAG approach's superior performance across various benchmarks.