AI ROMay 7

Randomness is sometimes necessary for coordination

Rohan Patil, Jai Malegaonkar, Henrik I. Christensen

arXiv:2605.068257.3

Predicted impact top 84% in AI · last 90 daysOriginality Highly original

AI Analysis

This work solves a fundamental limitation of parameter sharing in cooperative MARL for homogeneous agents, enabling necessary coordination through structured randomness.

The paper addresses the problem of role differentiation in cooperative multi-agent reinforcement learning with homogeneous agents under permutation-symmetric observations, where shared deterministic policies fail. The proposed Diamond Attention architecture uses random scalar sampling per agent to enable a random-bit coordination protocol, achieving 1.0 success on the XOR game (vs. ~0.5 for baselines) and zero-shot transfer to varying team sizes and scenarios.

Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making role differentiation impossible. This failure can theoretically be resolved using symmetry breaking among anonymous identical processors, which requires randomness. We propose Diamond Attention, a cross-attention architecture in which each agent samples a scalar random number per timestep, inducing a transient rank ordering that masks lower-ranked peers from agent-to-agent attention while leaving task attention fully unmasked. This realizes a random-bit coordination protocol in a single broadcast round, and the set-based attention enables zero-shot deployment to teams of different sizes. We evaluate across three regimes that isolate when structured randomness matters. On the perfectly symmetric XOR game, our method achieves $1.0$ success while all deterministic baselines plateau near $0.5$. On control coordination tasks, a policy trained on $N=4$ generalizes zero-shot to $N \in [2,8]$. On SMACLite cross-scenario transfer, we achieve zero-shot transfer where standard baselines cannot transfer due to structural limitations. Furthermore, replacing the structured mask with standard dropout-based randomness results in a 0\% win rate, confirming that protocol-space structure, not stochastic noise, is the operative ingredient. https://anonymous.4open.science/r/randomness-137A/

View on arXiv PDF

Similar