AIApr 20

LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

Wanli Li, Bince Qu, Bo Pan, Jianyu Zhang, Zheng Liu, Pan Zhang, Wei Chen, Bo Zhang

arXiv:2604.1793152.3h-index: 6Has Code

Predicted impact top 3% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For researchers building deep research agents, this framework solves the scalability bottleneck of agentic RL by replacing costly real-world search with a virtual environment, enabling strong performance with small models.

LiteResearcher introduces a scalable agentic RL training framework that uses a lite virtual world to mirror real-world search dynamics, enabling a 4B model to achieve state-of-the-art results of 71.3% on GAIA and 78.0% on Xbench, outperforming larger models like Claude-4.5 Sonnet.

Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.

View on arXiv PDF Code

Similar