LGOct 21, 2022

Equivariant Networks for Zero-Shot Coordination

Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster

arXiv:2210.12124v215.622 citationsh-index: 67Has Code

Originality Highly original

AI Analysis

This addresses coordination failures in multi-agent systems, particularly in scenarios like Hanabi, offering a test-time improvement for pre-trained policies.

The paper tackles the problem of symmetry breaking in decentralized partially observable Markov decision processes (Dec-POMDPs) by introducing an equivariant network architecture, which improves zero-shot coordination and outperforms prior methods on the Hanabi benchmark.

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination, doing so more effectively than prior methods. Our method also acts as a ``coordination-improvement operator'' for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.

View on arXiv PDF Code

Similar