AIMAOct 15, 2020

Multi-Agent Trust Region Policy Optimization

arXiv:2010.07916v364 citations
Originality Incremental advance
AI Analysis

This work addresses privacy and scalability issues in multi-agent systems for researchers and practitioners in reinforcement learning, though it is incremental as it extends an existing method.

The authors tackled the challenge of applying trust region policy optimization to multi-agent reinforcement learning by transforming it into a distributed consensus problem, resulting in a decentralized algorithm called MATRPO that achieved robust performance in cooperative games.

We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By making a series of approximations to the consensus optimization model, we propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO). This algorithm can optimize distributed policies based on local observations and private rewards. The agents do not need to know observations, rewards, policies or value/action-value functions of other agents. The agents only share a likelihood ratio with their neighbors during the training process. The algorithm is fully decentralized and privacy-preserving. Our experiments on two cooperative games demonstrate its robust performance on complicated MARL tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes