IRAICLApr 7, 2025

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation

arXiv:2504.05220v54 citationsh-index: 13EMNLP
Originality Incremental advance
AI Analysis

This work addresses the need for efficient annotation in retrieval and RAG systems, offering a cost-effective solution for initializing QA systems on new corpora, though it is incremental in leveraging existing LLMs for annotation.

This paper tackles the problem of reducing costly human annotations for training retrieval and retrieval-augmented generation systems by using LLMs to annotate document utility, achieving improved out-of-domain retrieval and RAG outcomes, with LLM annotations combined with 20% human labels matching full human annotation performance.

This paper explores the use of large language models (LLMs) for annotating document utility in training retrieval and retrieval-augmented generation (RAG) systems, aiming to reduce dependence on costly human annotations. We address the gap between retrieval relevance and generative utility by employing LLMs to annotate document utility. To effectively utilize multiple positive samples per query, we introduce a novel loss that maximizes their summed marginal likelihood. Using the Qwen-2.5-32B model, we annotate utility on the MS MARCO dataset and conduct retrieval experiments on MS MARCO and BEIR, as well as RAG experiments on MS MARCO QA, NQ, and HotpotQA. Our results show that LLM-generated annotations enhance out-of-domain retrieval performance and improve RAG outcomes compared to models trained solely on human annotations or downstream QA metrics. Furthermore, combining LLM annotations with just 20% of human labels achieves performance comparable to using full human annotations. Our study offers a comprehensive approach to utilizing LLM annotations for initializing QA systems on new corpora.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes