A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning
This work addresses the problem of partial observability and policy changes in multi-agent systems for researchers in reinforcement learning, though it appears incremental as it builds on existing factorization methods.
The authors tackled the challenge of high stochasticity in cooperative multi-agent reinforcement learning by proposing DFAC, a unified framework that integrates distributional RL with value function factorization, and demonstrated its effectiveness by outperforming baselines on StarCraft Multi-Agent Challenge maps and self-designed Ultra Hard maps.
In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.