LG MA MLSep 9, 2020

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao

arXiv:2009.04197v57.99 citations

Originality Incremental advance

AI Analysis

This addresses improved performance in multi-agent systems for tasks like gaming, but it is incremental as it builds on existing methods like QMIX.

The paper tackled the problem of randomness in long-term returns in cooperative multi-agent reinforcement learning by proposing QR-MIX, which models joint state-action values as a distribution using quantile regression, and it outperformed QMIX in the StarCraft Multi-Agent Challenge environment.

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness of returns, but also more tolerant of the randomness of monotonic constraints. The experimental results demonstrate that QR-MIX outperforms the previous state-of-the-art method QMIX in the StarCraft Multi-Agent Challenge (SMAC) environment.

View on arXiv PDF

Similar