GT LG MA OCJun 19, 2025

Solving Zero-Sum Convex Markov Games

Fivos Kalogiannis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Ian Gemp, Georgios Piliouras

arXiv:2506.16120v14 citationsh-index: 14ICML

Originality Highly original

AI Analysis

This work addresses a fundamental challenge in multi-agent reinforcement learning for researchers and practitioners, providing theoretical foundations for convergence in complex strategic interactions, though it is incremental in building on recent definitions of convex Markov games.

The authors tackled the problem of achieving global convergence to Nash equilibria in two-player zero-sum convex Markov games, a challenging setting due to nonconvexity and lack of Bellman consistency, by introducing a nonconvex regularization method that transforms the problem into a nonconvex-proximal Polyak-Lojasiewicz objective, resulting in the first provable guarantees for independent policy gradient methods.

We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon. We follow a two-step approach. First, leveraging properties of hidden-convex--hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.

View on arXiv PDF

Similar