LGAIGTJul 25, 2022

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

arXiv:2207.12463v112 citationsh-index: 48
Originality Highly original
AI Analysis

This work addresses the theoretical gap in competitive reinforcement learning for researchers, providing provable efficiency guarantees in structured game settings.

The paper tackles the problem of multi-agent policy optimization in competitive environments by proposing fictitious play algorithms for zero-sum Markov games with structured transitions, achieving tight regret bounds of O(√K) for each agent against adversarial opponents.

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $\widetilde{\mathcal{O}}(\sqrt{K})$ regret bounds after $K$ episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is $\widetilde{\mathcal{O}}(\sqrt{K})$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes