MA AIMay 10

CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs

Chelsea Zou, Yiheng Yao, Selena She, Robert D. Hawkins

arXiv:2605.0982380.3

Predicted impact top 17% in MA · last 90 daysOriginality Incremental advance

AI Analysis

Provides a practical, verifiable benchmark for studying coordination-privacy trade-offs in decentralized multi-agent LLM systems.

CalBench introduces a controlled environment for evaluating multi-agent coordination in calendar scheduling, where agents with private calendars must schedule meetings while minimizing costs and preserving privacy. The benchmark enables precise measurement of coordination quality, communication efficiency, fairness, and privacy leakage.

We introduce CalBench, a controlled evaluation environment for studying multi-agent coordination through calendar scheduling. In CalBench, N agents each manage a private calendar containing pre-existing commitments and must coordinate to schedule a stream of M incoming meetings while minimizing disruption costs. Because agents observe only their own calendars, successful scheduling requires communication across private information boundaries. Each scenario is generated with an oracle solution, enabling precise measurement of coordination quality via realized-to-optimal cost, as well as a Distributed Constraint Optimization (DCOP) baseline to provide a fair comparison under the same private-information constraints. CalBench enables precise verification of task success, communication efficiency, and fairness in the distribution of disruption costs. Our environment also studies privacy-preserving coordination by augmenting calendar entries with private semantic contexts of varying sensitivity and measuring whether agents reveal task-irrelevant private information during negotiation. Unlike multi-agent benchmarks where a single capable agent can often substitute for the group, CalBench is inherently decentralized: no agent has access to another agent's private calendar, yet agents must still reach mutually consistent decisions over shared meeting scheduling. CalBench therefore provides a practical and verifiable setting for studying coordination protocols, communication efficiency, negotiation strategies, fairness, and privacy leakage in multi-agent systems.

View on arXiv PDF

Similar