LGOct 21, 2022

Equivariant Networks for Zero-Shot Coordination

arXiv:2210.12124v222 citationsh-index: 67
Originality Highly original
AI Analysis

This addresses coordination failures in multi-agent systems, particularly in scenarios like Hanabi, offering a test-time improvement for pre-trained policies.

The paper tackles the problem of symmetry breaking in decentralized partially observable Markov decision processes (Dec-POMDPs) by introducing an equivariant network architecture, which improves zero-shot coordination and outperforms prior methods on the Hanabi benchmark.

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination, doing so more effectively than prior methods. Our method also acts as a ``coordination-improvement operator'' for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes