CL AIApr 4, 2024

Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models

Chengkai Huang, Yu Xia, Rui Wang, Kaige Xie, Tong Yu, Julian McAuley, Lina Yao

arXiv:2404.03514v213.824 citationsh-index: 11COLING

Originality Incremental advance

AI Analysis

This work addresses efficiency and accuracy issues in retrieval-augmented generation for NLP practitioners, though it is incremental as it builds on prior adaptive retrieval methods.

The paper tackles the problem of retrieval not always being helpful for large language models when they already know the answer, by proposing an adaptive retrieval method that uses pre-trained token embeddings to decide when to retrieve, achieving superior performance across benchmarks.

Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. However, it was observed by previous works that retrieval is not always helpful, especially when the LLM is already knowledgeable on the query to answer. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM. Previous works of ARAG either require accessing the pre-training corpus or prompting with additional model inferences. Aiming to avoid such drawbacks, we propose to determine whether the model is knowledgeable on a query via inspecting the (contextualized) pre-trained token embeddings of LLMs. We hypothesize that such embeddings capture rich information on the model's intrinsic knowledge base, which enables an efficient way of judging the necessity to retrieve from an external corpus. Extensive experiments demonstrate our ARAG approach's superior performance across various benchmarks.

View on arXiv PDF

Similar