LG MAApr 27, 2021

Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

Bozhidar Vasilev, Tarun Gupta, Bei Peng, Shimon Whiteson

arXiv:2104.13446v23.11 citations

Originality Incremental advance

AI Analysis

This addresses the performance gap for researchers and practitioners using policy gradient methods in multi-agent scenarios, though it appears incremental as it enhances existing algorithms.

The paper tackled the sample inefficiency of on-policy policy gradient methods in multi-agent reinforcement learning by introducing semi-on-policy training, resulting in significant performance improvements that match or exceed state-of-the-art value-based methods on the StarCraft Multi-Agent Challenge benchmark.

Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between state-of-the-art policy gradient and value-based methods on the popular StarCraft Multi-Agent Challenge (SMAC) benchmark. In this paper, we introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods. We enhance two state-of-the-art policy gradient algorithms with SOP training, demonstrating significant performance improvements. Furthermore, we show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.

View on arXiv PDF

Similar