Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning

Wenyi Wu, Sibo Zhu, Kun Zhou, Biwei Huang

arXiv:2605.0216872.6

AI Analysis

For researchers building LM-based agents for complex tasks, this work provides a compute-efficient method to improve long-horizon planning by focusing on the planner role, though the modular decomposition itself is not novel.

The paper proposes a multi-agent framework for long-horizon planning that decomposes tasks into planner, actor, and memory manager roles, and introduces a planner-centric reinforcement learning approach that optimizes only the planner using VLM-as-judge rewards. Experiments on web navigation, OS control, and tool use benchmarks show compute-efficient improvements in long-horizon agent automation.

Language model (LM)-based agents have demonstrated promising capabilities in automating complex tasks from natural language instructions, yet they continue to struggle with long-horizon planning and reasoning. To address this, we propose an enhanced multi-agent framework that decomposes automation into three roles: a planner for high-level decision-making, an actor for task execution, and a memory manager for contextual reasoning. While this modular decomposition aligns with established design patterns, our core contribution lies in a systematic compute-allocation analysis, revealing that planning is the dominant factor influencing task performance. Execution and memory management require significantly less compute and model capacity to achieve competitive results. Building on these insights, we introduce a planner-centric reinforcement learning approach, which exclusively optimizes the planner using trajectory-level rewards from a VLM-as-judge, while freezing the other components. Extensive experiments on benchmarks spanning web navigation, OS control, and tool use demonstrate that concentrating model capacity and learning on high-level planning yields robust and compute-efficient improvements in long-horizon agent automation. Our code is publicly released.

View on arXiv PDF

Similar