QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning
This addresses a key bottleneck in multi-agent reinforcement learning for applications like gaming and robotics, though it appears incremental as it builds on existing factorization methods.
The paper tackles the problem of extracting decentralized policies from a centralized joint policy in multi-agent reinforcement learning by proposing QFree, a universal value function factorization method that ensures the individual-global-max principle without compromise, achieving state-of-the-art performance in the Starcraft Multi-Agent Challenge benchmark.
Centralized training is widely utilized in the field of multi-agent reinforcement learning (MARL) to assure the stability of training process. Once a joint policy is obtained, it is critical to design a value function factorization method to extract optimal decentralized policies for the agents, which needs to satisfy the individual-global-max (IGM) principle. While imposing additional limitations on the IGM function class can help to meet the requirement, it comes at the cost of restricting its application to more complex multi-agent environments. In this paper, we propose QFree, a universal value function factorization method for MARL. We start by developing mathematical equivalent conditions of the IGM principle based on the advantage function, which ensures that the principle holds without any compromise, removing the conservatism of conventional methods. We then establish a more expressive mixing network architecture that can fulfill the equivalent factorization. In particular, the novel loss function is developed by considering the equivalent conditions as regularization term during policy evaluation in the MARL algorithm. Finally, the effectiveness of the proposed method is verified in a nonmonotonic matrix game scenario. Moreover, we show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment, Starcraft Multi-Agent Challenge (SMAC).