MA AI LGJun 22, 2021

MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning

Zhiwei Xu, Dapeng Li, Yunpeng Bai, Guoliang Fan

arXiv:2106.11652v16.613 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of randomness in multi-agent cooperation for tasks like gaming or robotics, but it appears incremental as it builds on existing value decomposition methods.

The paper tackles the problem of randomness in cooperative multi-agent reinforcement learning by proposing MMD-MIX, which combines distributional reinforcement learning with value decomposition, and it outperforms prior baselines in the StarCraft Multi-Agent Challenge environment.

In the real world, many tasks require multiple agents to cooperate with each other under the condition of local observations. To solve such problems, many multi-agent reinforcement learning methods based on Centralized Training with Decentralized Execution have been proposed. One representative class of work is value decomposition, which decomposes the global joint Q-value $Q_\text{jt}$ into individual Q-values $Q_a$ to guide individuals' behaviors, e.g. VDN (Value-Decomposition Networks) and QMIX. However, these baselines often ignore the randomness in the situation. We propose MMD-MIX, a method that combines distributional reinforcement learning and value decomposition to alleviate the above weaknesses. Besides, to improve data sampling efficiency, we were inspired by REM (Random Ensemble Mixture) which is a robust RL algorithm to explicitly introduce randomness into the MMD-MIX. The experiments demonstrate that MMD-MIX outperforms prior baselines in the StarCraft Multi-Agent Challenge (SMAC) environment.

View on arXiv PDF Code

Similar