CLMay 19, 2025

Transparent and Robust RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability

Jingyi Ren, Yekun Xu, Xiaolong Wang, Weitao Li, Weizhi Ma, Yang Liu

arXiv:2505.13258v25 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses interpretability and robustness issues in RAG systems for knowledge-intensive applications, representing a strong incremental improvement.

The paper tackles the lack of transparency and training instability in reinforcement learning-based Retrieval-Augmented Generation (RAG) systems by proposing ARENA, a framework that improves accuracy by 10-30% on multi-hop QA datasets and achieves performance comparable to advanced closed-source LLMs.

Retrieval-Augmented Generation (RAG) delivers substantial value in knowledge-intensive applications. Many recent works use reinforcement learning (RL) to elicit strong reasoning in RAG generators. However, two key challenges remain unresolved: (1) Transparency: most prior methods do not explicitly indicate which references are actually used during the reasoning that leads to the final answer, limiting interpretability and visibility; (2) Stability: the KL divergence estimator used in existing RL-based approaches may cause gradient spikes, leading to unstable training. To address these challenges, we propose Adaptive-Rewarded Evidence Navigation Agent (ARENA), a transparent and robust RAG generator framework trained via RL with designed rewards. Based on our structured protocol, KL divergence stabilization, and adaptive reward calculation modules, ARENA enables the RAG generator to identify key evidence, perform structured reasoning, and generate answers with interpretable decision traces. Applied to Qwen2.5-7B-Instruct and Llama3.1-8B-Instruct, extensive experiments across multiple baselines show 10-30% accuracy improvements on three multi-hop QA datasets, comparable to advanced closed-source LLMs (e.g., OpenAI o1, DeepSeek R1). Further analyses show that ARENA generalizes well to unseen datasets and tasks. Our models and codes are publicly released.

View on arXiv PDF

Similar