LG MLMar 15, 2019

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Basar, Ji Liu

arXiv:1903.06372v316.872 citations

Originality Incremental advance

AI Analysis

This work addresses distributed reinforcement learning for networked agents, representing an incremental advancement by combining existing methods in multi-agent and off-policy settings.

The paper tackles the problem of extending off-policy reinforcement learning to multi-agent systems with networked communication, developing a new algorithm that includes convergence guarantees under linear function approximation.

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm.

View on arXiv PDF

Similar