AICLOct 14, 2025

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

arXiv:2510.12979v12 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of enhancing planning capabilities for deep research agents, representing an incremental improvement over existing methods.

The paper tackles the problem of optimizing planning stages in deep research agents by proposing DeepPlanner, an RL framework that shapes token-level advantages with entropy-based terms, resulting in improved planning quality and state-of-the-art performance across seven benchmarks with lower training costs.

Large language models (LLMs) augmented with multi-step reasoning and action generation abilities have shown promise in leveraging external tools to tackle complex tasks that require long-horizon planning. However, existing approaches either rely on implicit planning in the reasoning stage or introduce explicit planners without systematically addressing how to optimize the planning stage. As evidence, we observe that under vanilla reinforcement learning (RL), planning tokens exhibit significantly higher entropy than other action tokens, revealing uncertain decision points that remain under-optimized. To address this, we propose DeepPlanner, an end-to-end RL framework that effectively enhances the planning capabilities of deep research agents. Our approach shapes token-level advantage with an entropy-based term to allocate larger updates to high entropy tokens, and selectively upweights sample-level advantages for planning-intensive rollouts. Extensive experiments across seven deep research benchmarks demonstrate that DeepPlanner improves planning quality and achieves state-of-the-art results under a substantially lower training budget.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes