MAApr 13

Can Small Agents Collaborate to Beat a Single Large Language Model?

arXiv:2601.1132788.7h-index: 20
AI Analysis

For researchers and practitioners building language model systems, this work demonstrates that architectural orchestration can be more impactful than model scaling for agentic tasks.

The paper investigates whether multi-agent systems composed of smaller language models can outperform a single large language model on tool-intensive benchmarks. Results show that small multi-agent systems can surpass much larger single-agent models, with orchestrator capacity being the primary driver of performance rather than sub-agent capacity.

Recent progress in language modeling has largely relied on scaling model size, yet larger models do not reliably improve performance on tasks requiring multi-step reasoning and tool use. Multi-agent collaboration offers a potential alternative, raising a key question: can well-organized systems built from smaller models outperform much larger language models? We address this question using a minimally designed multi-agent system with a single orchestrator and a small set of specialized sub-agents with restricted communication. On tool-intensive benchmarks spanning factual retrieval, multi-hop reasoning, scientific question answering, and mathematical problem solving, we conduct controlled comparisons between small multi-agent systems and large single-agent models. We find that small multi-agent systems can outperform substantially larger single-agent models, even when the latter have direct access to tools. Reasoning at the orchestrator yields the largest gains, while enabling reasoning in sub-agents provides limited or negative benefits. Overall system performance is driven primarily by orchestrator capacity rather than sub-agent capacity. These results suggest that improved agentic performance depends more on architectural orchestration than on raw model scaling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes