CLMay 27

Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?

arXiv:2605.2788133.6Has Code
Predicted impact top 13% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers training search agents with LLMs, this paper clarifies which factors (data coverage, reward design, training protocols) actually drive performance improvements, revealing that data quality matters more than algorithmic differences.

The paper identifies a critical data-coverage issue in the widely used Wikipedia 2018 corpus for training search agents and shows that correcting it yields larger gains than differences between training algorithms. It also finds that simple outcome-based reward methods are competitive or superior to process-based methods, and provides practical guidelines for training effective search agents.

Search agents powered by large language models can autonomously decompose queries, retrieve information, and synthesize answers through multi-step reasoning. However, the rapid growth of training methods has outpaced controlled comparison: existing works differ in retrieval corpora, reward designs, and training protocols, making it unclear what actually drives improvements. We present a controlled empirical study that isolates three under-explored dimensions of search agent training. First, we identify a critical data-coverage issue in the widely used Wikipedia 2018 corpus and show that correcting it alone yields larger gains than the differences between training algorithms. Second, we systematically compare outcome-based and process-based reward methods across three base models, finding that the simplest outcome-based approach achieves competitive or superior performance in most settings, and that process-level credit assignment can over-correct agent behavior. Third, we analyze training data diversity, off-policy data utilization, and search budget scaling, distilling practical guidelines for training effective search agents. Our code is available at https://github.com/YiboZhao624/SearchAgentReview.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes