CLJul 30, 2025

From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs

Jie He, Victor Gutiérrez-Basulto, Jeff Z. Pan

arXiv:2507.22716v25 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This addresses reasoning failures in RAG for LLMs, offering an incremental improvement with specific gains on QA benchmarks.

The paper tackles the problem of reinforcement learning-based retrieval-augmented generation (RAG) methods overlooking intermediate reasoning quality in large language models (LLMs), proposing TIRESRAG-R1 with a multi-dimensional reward system to improve reasoning and stability. Experiments on four multi-hop QA datasets show it outperforms prior RAG methods and generalizes to single-hop tasks.

Reinforcement learning-based retrieval-augmented generation (RAG) methods enhance the reasoning abilities of large language models (LLMs). However, most rely only on final-answer rewards, overlooking intermediate reasoning quality. This paper analyzes existing RAG reasoning models and identifies three main failure patterns: (1) information insufficiency, meaning the model fails to retrieve adequate support; (2) faulty reasoning, where logical or content-level flaws appear despite sufficient information; and (3) answer-reasoning inconsistency, where a valid reasoning chain leads to a mismatched final answer. We propose TIRESRAG-R1, a novel framework using a think-retrieve-reflect process and a multi-dimensional reward system to improve reasoning and stability. TIRESRAG-R1 introduces: (1) a sufficiency reward to encourage thorough retrieval; (2) a reasoning quality reward to assess the rationality and accuracy of the reasoning chain; and (3) a reflection reward to detect and revise errors. It also employs a difficulty-aware reweighting strategy and training sample filtering to boost performance on complex tasks. Experiments on four multi-hop QA datasets show that TIRESRAG-R1 outperforms prior RAG methods and generalizes well to single-hop tasks. The code and data are available at: https://github.com/probe2/TIRESRAG-R1.

View on arXiv PDF Code

Similar