AIMar 16, 2023

SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning

Shuhan Qi, Shuhao Zhang, Qiang Wang, Jiajia Zhang, Jing Xiao, Xuan Wang

arXiv:2303.09058v12.1h-index: 43

Originality Incremental advance

AI Analysis

This addresses sample inefficiency and exploration challenges in cooperative multi-agent systems, representing an incremental improvement over existing value-decomposition methods.

The paper tackled the problems of high sample consumption and lack of active exploration in value-decomposition methods for cooperative multi-agent reinforcement learning, proposing SVDE which achieved the best performance on most maps in StarCraft II games and accelerated sample collection and policy convergence.

Value-decomposition methods, which reduce the difficulty of a multi-agent system by decomposing the joint state-action space into local observation-action spaces, have become popular in cooperative multi-agent reinforcement learning (MARL). However, value-decomposition methods still have the problems of tremendous sample consumption for training and lack of active exploration. In this paper, we propose a scalable value-decomposition exploration (SVDE) method, which includes a scalable training mechanism, intrinsic reward design, and explorative experience replay. The scalable training mechanism asynchronously decouples strategy learning with environmental interaction, so as to accelerate sample generation in a MapReduce manner. For the problem of lack of exploration, an intrinsic reward design and explorative experience replay are proposed, so as to enhance exploration to produce diverse samples and filter non-novel samples, respectively. Empirically, our method achieves the best performance on almost all maps compared to other popular algorithms in a set of StarCraft II micromanagement games. A data-efficiency experiment also shows the acceleration of SVDE for sample collection and policy convergence, and we demonstrate the effectiveness of factors in SVDE through a set of ablation experiments.

View on arXiv PDF

Similar