LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding
This work addresses the problem of enhancing retrieval effectiveness for information retrieval systems, though it appears incremental as it builds on existing models.
The paper tackles improving retrieval models by introducing a model-agnostic doc-level embedding framework augmented with large language models, achieving state-of-the-art results on LoTTE and BEIR datasets.
Recently embedding-based retrieval or dense retrieval have shown state of the art results, compared with traditional sparse or bag-of-words based approaches. This paper introduces a model-agnostic doc-level embedding framework through large language model (LLM) augmentation. In addition, it also improves some important components in the retrieval model training process, such as negative sampling, loss function, etc. By implementing this LLM-augmented retrieval framework, we have been able to significantly improve the effectiveness of widely-used retriever models such as Bi-encoders (Contriever, DRAGON) and late-interaction models (ColBERTv2), thereby achieving state-of-the-art results on LoTTE datasets and BEIR datasets.