MAAICLCYMar 3, 2025

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

arXiv:2503.01935v1120 citationsh-index: 29Has CodeACL
Originality Incremental advance
AI Analysis

This provides a new benchmark for researchers and developers working on multi-agent LLM systems, though it is incremental as it builds on existing single-agent and domain-specific benchmarks.

The authors tackled the lack of comprehensive benchmarks for evaluating multi-agent coordination and competition in LLM-based systems by introducing MultiAgentBench, which measures task completion and collaboration quality across diverse scenarios. Results show GPT-4o-mini achieved the highest average task score, graph structures performed best among coordination protocols, and cognitive planning improved milestone achievement rates by 3%.

Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents, yet existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition. In this paper, we introduce MultiAgentBench, a comprehensive benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios. Our framework measures not only task completion but also the quality of collaboration and competition using novel, milestone-based key performance indicators. Moreover, we evaluate various coordination protocols (including star, chain, tree, and graph topologies) and innovative strategies such as group discussion and cognitive planning. Notably, gpt-4o-mini reaches the average highest task score, graph structure performs the best among coordination protocols in the research scenario, and cognitive planning improves milestone achievement rates by 3%. Code and datasets are public available at https://github.com/MultiagentBench/MARBLE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes