CL AIFeb 19, 2025

RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation

Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang

arXiv:2502.13957v217.011 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the problem of optimizing language agents for knowledge-intensive tasks, offering a systematic approach to improve performance in agentic RAG, though it is incremental in advancing existing methods.

The paper tackles the lack of a unified optimization framework for agentic retrieval-augmented generation (RAG) by introducing RAG-Gym, which systematically explores prompt engineering, actor tuning, and critic training, resulting in the Re$^2$Search++ agent that achieves relative F1 improvements of 3.2% to 11.6% over recent methods.

Retrieval-augmented generation (RAG) has shown great promise for knowledge-intensive tasks and recently advanced with agentic RAG, where language agents engage in multi-round interactions with external knowledge sources for adaptive information retrieval. However, existing agentic RAG methods often depend on ad-hoc prompt engineering and lack a unified optimization framework. We introduce RAG-Gym, a comprehensive platform that systematically explores three optimization dimensions: (1) prompt engineering, (2) actor tuning, and (3) critic training. For prompt engineering, we propose Re$^2$Search, a novel agent incorporating reasoning reflection that significantly outperforms standard prompts. In actor tuning, we evaluate three popular post-training algorithms with fine-grained process supervision and identify direct preference optimization as the most effective. We further demonstrate that a trained critic can enhance inference by selecting higher-quality intermediate reasoning steps. Together, these findings lead to the optimized Re$^2$Search++ agent, which surpasses most recent methods like Search-R1 by a relative increase of 3.2% to 11.6% in average F1. Finally, we examine the impact of different reward sources and analyze scaling properties in training and inference, offering practical insights for agentic RAG optimization. The project homepage is available at https://rag-gym.github.io.

View on arXiv PDF

Similar