AICLIRMay 5, 2025

Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing

arXiv:2505.02811v226 citationsh-index: 8SIGIR
Originality Incremental advance
AI Analysis

This addresses inefficiencies in multi-round RAG systems for complex tasks, offering a data-efficient solution that enhances self-awareness, though it is incremental as it builds on existing RAG methods.

The paper tackles the problem of multi-round retrieval in RAG systems, which often lack self-awareness and lead to inefficient or incorrect answers, by introducing SIM-RAG, a framework that uses self-practicing to train a lightweight Critic for guiding retrieval decisions, resulting in improved performance across multiple benchmarks without requiring costly human-labeled data.

Retrieval Augmented Generation (RAG) has shown strong capability in enhancing language models' knowledge and reducing AI generative hallucinations, driving its widespread use. However, complex tasks requiring multi-round retrieval remain challenging, and early attempts tend to be overly optimistic without a good sense of self-skepticism. Current multi-round RAG systems may continue searching even when enough information has already been retrieved, or they may provide incorrect answers without having sufficient information or knowledge. Existing solutions either require large amounts of expensive human-labeled process supervision data or lead to subpar performance. This paper aims to address these limitations by introducing a new framework, SIM-RAG, to explicitly enhance RAG systems' self-awareness and multi-round retrieval capabilities. To train SIM-RAG, we first let a RAG system self-practice multi-round retrieval, augmenting existing question-answer pairs with intermediate inner monologue reasoning steps to generate synthetic training data. For each pair, the system may explore multiple retrieval paths, which are labeled as successful if they reach the correct answer and unsuccessful otherwise. Using this data, we train a lightweight information sufficiency Critic. At inference time, the Critic evaluates whether the RAG system has retrieved sufficient information at each round, guiding retrieval decisions and improving system-level self-awareness through in-context reinforcement learning. Experiments across multiple prominent RAG benchmarks show that SIM-RAG is an effective multi-round RAG solution. Furthermore, this framework is system-efficient, adding a lightweight component to RAG without requiring modifications to existing LLMs or search engines, and data-efficient, eliminating the need for costly human-annotated mid-step retrieval process supervision data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes