MAApr 13

Can Small Agents Collaborate to Beat a Single Large Language Model?

Agata Żywot, Xinyi Chen, Yifei Yuan, Anders Søgaard, Maarten de Rijke

arXiv:2601.1132788.7h-index: 20

AI Analysis

For researchers and practitioners building language model systems, this work demonstrates that architectural orchestration can be more impactful than model scaling for agentic tasks.

The paper investigates whether multi-agent systems composed of smaller language models can outperform a single large language model on tool-intensive benchmarks. Results show that small multi-agent systems can surpass much larger single-agent models, with orchestrator capacity being the primary driver of performance rather than sub-agent capacity.

Recent progress in language modeling has largely relied on scaling model size, yet larger models do not reliably improve performance on tasks requiring multi-step reasoning and tool use. Multi-agent collaboration offers a potential alternative, raising a key question: can well-organized systems built from smaller models outperform much larger language models? We address this question using a minimally designed multi-agent system with a single orchestrator and a small set of specialized sub-agents with restricted communication. On tool-intensive benchmarks spanning factual retrieval, multi-hop reasoning, scientific question answering, and mathematical problem solving, we conduct controlled comparisons between small multi-agent systems and large single-agent models. We find that small multi-agent systems can outperform substantially larger single-agent models, even when the latter have direct access to tools. Reasoning at the orchestrator yields the largest gains, while enabling reasoning in sub-agents provides limited or negative benefits. Overall system performance is driven primarily by orchestrator capacity rather than sub-agent capacity. These results suggest that improved agentic performance depends more on architectural orchestration than on raw model scaling.

View on arXiv PDF

Similar