LG GT MA MLJan 23, 2019

Open-ended Learning in Symmetric Zero-sum Games

David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech M. Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel

arXiv:1901.08106v229.2199 citations

Originality Incremental advance

AI Analysis

This addresses the problem of improving agent performance in complex, nontransitive games for researchers and practitioners in AI and game theory, though it is incremental as it builds on prior methods like PSRO.

The paper tackles the challenge of open-ended learning in nontransitive zero-sum games, where strategic cycles make defining clear objectives difficult, and introduces a geometric framework and algorithm (PSRO_rN) that constructs diverse populations of effective agents, outperforming existing alternatives in resource allocation games.

Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.

View on arXiv PDF

Similar