MALGOCNov 25, 2021

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

arXiv:2111.12961v35 citations
Originality Incremental advance
AI Analysis

This work addresses scalability and efficiency issues in collaborative multi-agent systems, representing an incremental improvement over existing methods.

The paper tackles the problem of high variance and distribution shift in distributed policy gradient methods for multi-agent reinforcement learning, proposing a variance reduction and gradient tracking approach that achieves sample and communication complexity bounds for finding an ε-approximate stationary point.

This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due to the non-concave performance function of policy gradient, the existing distributed stochastic optimization methods for convex problems cannot be directly used for policy gradient in MARL. This paper proposes a distributed policy gradient with variance reduction and gradient tracking to address the high variances of policy gradient, and utilizes importance weight to solve the {distribution shift} problem in the sampling process. We then provide an upper bound on the mean-squared stationary gap, which depends on the number of iterations, the mini-batch size, the epoch size, the problem parameters, and the network topology. We further establish the sample and communication complexity to obtain an $ε$-approximate stationary point. Numerical experiments are performed to validate the effectiveness of the proposed algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes