LGGTOCMLFeb 17, 2021

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

arXiv:2102.08903v224 citations
AI Analysis

This provides the first provably efficient solution for a fundamental problem in reinforcement learning and game theory, addressing a key bottleneck for large-scale applications.

The paper tackles the problem of obtaining optimization and statistical guarantees for policy-based methods with function approximation in two-player zero-sum Markov games, and presents a new algorithm that provably finds a near-optimal policy within a polynomial number of samples and iterations.

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes