LG AI OC MLMar 21, 2019

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

arXiv:1903.09255v113.751 citations

Originality Incremental advance

AI Analysis

This addresses coordination in multi-agent systems without sharing local task information, though it appears incremental as it builds on existing distributed actor-critic methods.

The paper tackles multi-agent reinforcement learning by proposing a distributed off-policy actor-critic method where agents maintain local estimates of the global optimal policy and use a consensus step to achieve agreement, validated with a distributed resource allocation example.

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

View on arXiv PDF

Similar