AI CL MAJan 21

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

Zixuan Ke, Yifei Ming, Austin Xu, Ryan Chin, Xuan-Phi Nguyen, Prathyusha Jwalapuram, Semih Yavuz, Caiming Xiong, Shafiq Joty

arXiv:2601.14652v111.911 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work addresses the methodological complexity and efficacy uncertainty in multi-agent systems for AI researchers and practitioners, offering a novel framework and benchmark, though it appears incremental in improving existing MAS approaches.

The paper tackles the problem of multi-agent system (MAS) design by proposing MAS-Orchestra, a framework that formulates MAS orchestration as a function-calling reinforcement learning problem with holistic orchestration, and it achieves consistent improvements on benchmarks like mathematical reasoning and multi-hop QA. It also introduces MASBENCH, a controlled benchmark to study when and why MAS are beneficial, revealing that gains depend on task structure and other factors rather than being universal.

While multi-agent systems (MAS) promise elevated intelligence through coordination of agents, current approaches to automatic MAS design under-deliver. Such shortcomings stem from two key factors: (1) methodological complexity - agent orchestration is performed using sequential, code-level execution that limits global system-level holistic reasoning and scales poorly with agent complexity - and (2) efficacy uncertainty - MAS are deployed without understanding if there are tangible benefits compared to single-agent systems (SAS). We propose MAS-Orchestra, a training-time framework that formulates MAS orchestration as a function-calling reinforcement learning problem with holistic orchestration, generating an entire MAS at once. In MAS-Orchestra, complex, goal-oriented sub-agents are abstracted as callable functions, enabling global reasoning over system structure while hiding internal execution details. To rigorously study when and why MAS are beneficial, we introduce MASBENCH, a controlled benchmark that characterizes tasks along five axes: Depth, Horizon, Breadth, Parallel, and Robustness. Our analysis reveals that MAS gains depend critically on task structure, verification protocols, and the capabilities of both orchestrator and sub-agents, rather than holding universally. Guided by these insights, MAS-Orchestra achieves consistent improvements on public benchmarks including mathematical reasoning, multi-hop QA, and search-based QA. Together, MAS-Orchestra and MASBENCH enable better training and understanding of MAS in the pursuit of multi-agent intelligence.

View on arXiv PDF

Similar