LGGTMAMLJun 18, 2020

Competitive Policy Optimization

arXiv:2006.10611v116 citations
Originality Incremental advance
AI Analysis

This addresses the problem of policy optimization in competitive multi-agent settings for researchers and practitioners, offering incremental improvements over existing methods.

The paper tackles the challenge of designing efficient optimization methods with desirable convergence and stability properties in competitive Markov decision processes by proposing competitive policy optimization (CoPO), a novel policy gradient approach that exploits game-theoretic interactions. The result shows that CoPO provides stable optimization, convergence to sophisticated strategies, and higher scores against baseline methods in challenging competitive games.

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes