LGAIOCMLMar 21, 2019

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

arXiv:1903.09255v151 citations
AI Analysis

This addresses coordination in multi-agent systems without sharing local task information, though it appears incremental as it builds on existing distributed actor-critic methods.

The paper tackles multi-agent reinforcement learning by proposing a distributed off-policy actor-critic method where agents maintain local estimates of the global optimal policy and use a consensus step to achieve agreement, validated with a distributed resource allocation example.

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes