CLOct 28, 2025

Reinforcement Learning for Long-Horizon Multi-Turn Search Agents

arXiv:2510.24126v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing long-horizon reasoning for AI agents in domain-specific applications like legal search, though it is incremental as it builds on existing RL and LLM techniques.

The paper tackles the problem of improving LLM agents' performance on complex multi-turn search tasks by applying reinforcement learning instead of prompt-based methods, achieving 85% accuracy versus 78% for frontier models on a legal document search benchmark.

Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinforcement Learning (RL) can push capabilities significantly further by learning from experience. Through experiments on a legal document search benchmark, we show that our RL-trained 14 Billion parameter model outperforms frontier class models (85% vs 78% accuracy). In addition, we explore turn-restricted regimes, during training and at test-time, that show these agents achieve better results if allowed to operate over longer multi-turn horizons.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes