LG AI MANov 22, 2021

Off-Policy Correction For Multi-Agent Reinforcement Learning

Michał Zawalski, Błażej Osiński, Henryk Michalewski, Piotr Miłoś

arXiv:2111.11229v36.53 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses scalability and training efficiency issues in MARL for applications like gaming or robotics, though it is incremental as it builds on existing V-Trace methods.

The authors tackled the challenge of training multi-agent reinforcement learning (MARL) systems by proposing MA-Trace, an on-policy actor-critic algorithm that extends V-Trace to MARL, achieving high performance on the StarCraft Multi-Agent Challenge and exceeding state-of-the-art results on some tasks.

Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded - we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

View on arXiv PDF Code

Similar