MA CLMay 7, 2025

Benchmarking LLMs' Swarm intelligence

Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun

arXiv:2505.04364v44.33 citationsh-index: 5Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need to assess LLMs for future decentralized intelligent systems, though it's primarily incremental as it provides a new benchmark rather than a novel method.

The researchers tackled the problem of evaluating Large Language Models' ability to coordinate in decentralized multi-agent systems under swarm-like constraints, introducing SwarmBench as a benchmark with five coordination tasks. Results showed current LLMs exhibit significant task-dependent performance variations and struggle with robust long-range planning and adaptive strategy formation in these scenarios.

Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks (Pursuit, Synchronization, Foraging, Flocking, Transport) within a configurable 2D grid environment, forcing agents to rely solely on local sensory input ($k\times k$ view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Zero-shot evaluations of leading LLMs (e.g., deepseek-v3, o4-mini) reveal significant task-dependent performance variations. While some rudimentary coordination is observed, our results indicate that current LLMs significantly struggle with robust long-range planning and adaptive strategy formation under the uncertainty inherent in these decentralized scenarios. Assessing LLMs under such swarm-like constraints is crucial for understanding their utility in future decentralized intelligent systems. We release SwarmBench as an open, extensible toolkit-built on a customizable physical system-providing environments, prompts, evaluation scripts, and comprehensive datasets. This aims to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of emergent collective behavior under severe informational decentralization. Our code repository is available at https://github.com/x66ccff/swarmbench.

View on arXiv PDF Code

Similar