MA AIMar 1

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Yuzhe Zhang, Feiran Liu, Yi Shan, Xinyi Huang, Xin Yang, Yueqi Zhu, Xuxin Cheng, Cao Liu, Ke Zeng, Terry Jingchen Zhang, Wenyuan Jiang

arXiv:2603.01045v14.34 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses a fundamental limitation in scaling multi-agent LLM systems for AI researchers and practitioners, revealing a critical bottleneck rather than an incremental improvement.

The paper tackled the problem of whether multi-agent LLM systems can reliably compute with distributed information, finding that agents systematically fail to synthesize distributed state into correct answers despite forming appropriate coordination topologies, with coordination overhead eliminating parallelization gains at scale.

Large language models are increasingly deployed in multi-agent systems to overcome context limitations by distributing information across agents. Yet whether agents can reliably compute with distributed information -- rather than merely exchange it -- remains an open question. We introduce Silo-Bench, a role-agnostic benchmark of 30 algorithmic tasks across three communication complexity levels, evaluating 54 configurations over 1,620 experiments. Our experiments expose a fundamental Communication-Reasoning Gap: agents spontaneously form task-appropriate coordination topologies and exchange information actively, yet systematically fail to synthesize distributed state into correct answers. The failure is localized to the reasoning-integration stage -- agents often acquire sufficient information but cannot integrate it. This coordination overhead compounds with scale, eventually eliminating parallelization gains entirely. These findings demonstrate that naively scaling agent count cannot circumvent context limitations, and Silo-Bench provides a foundation for tracking progress toward genuinely collaborative multi-agent systems.

View on arXiv PDF

Similar