CLJun 4

CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments

Jiaju Chen, Bo Sun, Yuxuan Lu, Yun Wang, Dakuo Wang, Bingsheng Yao

arXiv:2606.0639932.0

Predicted impact top 35% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers evaluating multi-agent LLM systems, this provides a theory-grounded methodology to assess collaborative competence beyond task outcomes, addressing a known bottleneck in MAS effectiveness.

The paper introduces CollabSim, a simulation framework grounded in CSCW theory to evaluate the collaborative competence of LLM-based multi-agent systems. Experiments across four LLMs show CollabSim can capture condition effects, separate model performance patterns, and reveal task-dependent effects of agent design.

Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents' ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual and collective incentives, and repair misalignment as interaction unfolds. Decades of research in Computer-Supported Cooperative Work have characterized these requirements for human teams coordinating under constrained communication, yet existing MAS evaluations focus mainly on task outcomes or single-agent proficiency in reasoning, planning, and tool use. To enable a systematic analysis of agents' collaborative competence in MAS, we introduce CollabSim, a configurable simulation framework that combines a theory-grounded definition of collaborative capabilities, controlled manipulation of interaction conditions, and action-level probing of agents' internal states. Experiments across four LLMs show that CollabSim can capture condition effects, separate model performance patterns, and reveal task-dependent effects of agent design.

View on arXiv PDF

Similar