LGJan 11, 2021

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Constantinos Daskalakis, Dylan J. Foster, Noah Golowich

arXiv:2101.04233v129.2194 citations

Originality Highly original

AI Analysis

This provides a theoretical foundation for decentralized learning in competitive multi-agent systems, addressing a gap where prior work relied on centralized approaches.

The paper tackles the problem of convergence in competitive reinforcement learning with two agents using independent policy gradient methods, proving that with a two-timescale learning rate rule, their policies converge to a min-max equilibrium, marking the first finite-sample convergence result for such independent methods.

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.

View on arXiv PDF

Similar