AILGMAJan 28, 2022

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

arXiv:2201.12436v137 citations
AI Analysis

This addresses the challenge of creating AI agents that can collaborate effectively with diverse algorithms, which is crucial for real-world applications, though it is incremental as it builds on prior diversity-based methods.

The paper tackles the problem of evaluating cooperative AI beyond self-play and cross-play by introducing inter-algorithm cross-play, showing that existing methods underperform, and proposes Any-Play augmentation to achieve state-of-the-art performance in Hanabi.

Cooperative artificial intelligence with human or superhuman proficiency in collaborative tasks stands at the frontier of machine learning research. Prior work has tended to evaluate cooperative AI performance under the restrictive paradigms of self-play (teams composed of agents trained together) and cross-play (teams of agents trained independently but using the same algorithm). Recent work has indicated that AI optimized for these narrow settings may make for undesirable collaborators in the real-world. We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play, where agents are evaluated on teaming performance with all other agents within an experiment pool with no assumption of algorithmic similarities between agents. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation -- a multi-agent extension of diversity-based intrinsic rewards for zero-shot coordination (ZSC) -- for generalizing self-play-based algorithms to the inter-algorithm cross-play setting. We apply the Any-Play learning augmentation to the Simplified Action Decoder (SAD) and demonstrate state-of-the-art performance in the collaborative card game Hanabi.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes