VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model
This addresses scalability and non-stationarity problems in multi-agent systems, offering improved sample efficiency for tasks like cooperative goal achievement, though it appears incremental as it builds on existing model-based and value decomposition methods.
The paper tackles the challenge of high sample complexity in multi-agent reinforcement learning by proposing a model-based approach with a disentangled world model, achieving superior performance and high sample efficiency on benchmarks like StarCraft II and Multi-Agent MuJoCo.
In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the complicated environment dynamics. Our model produces imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. Experimental results on StarCraft II micro-management, Multi-Agent MuJoCo, and Level-Based Foraging challenges demonstrate that our method achieves high sample efficiency and exhibits superior performance compared to other baselines across a wide range of multi-agent learning tasks.