LGJun 20, 2024

ME-IGM: Individual-Global-Max in Maximum Entropy Multi-Agent Reinforcement Learning

Wen-Tse Chen, Yuxuan Li, Shiyu Huang, Jiayu Chen, Jeff Schneider

arXiv:2406.13930v42.6Has Code

Originality Incremental advance

AI Analysis

This addresses a critical limitation in cooperative MARL for scenarios like gaming and robotics, though it is incremental as it builds on existing IGM-based methods.

The paper tackled the misalignment between local policies and the joint policy in maximum entropy multi-agent reinforcement learning, which violates the IGM condition for credit assignment, and proposed ME-IGM, achieving state-of-the-art performance in 17 scenarios across SMAC-v2 and Overcooked.

Multi-agent credit assignment is a fundamental challenge for cooperative multi-agent reinforcement learning (MARL), where a team of agents learn from shared reward signals. The Individual-Global-Max (IGM) condition is a widely used principle for multi-agent credit assignment, requiring that the joint action determined by individual Q-functions maximizes the global Q-value. Meanwhile, the principle of maximum entropy has been leveraged to enhance exploration in MARL. However, we identify a critical limitation in existing maximum entropy MARL methods: a misalignment arises between local policies and the joint policy that maximizes the global Q-value, leading to violations of the IGM condition. To address this misalignment, we propose an order-preserving transformation. Building on it, we introduce ME-IGM, a novel maximum entropy MARL algorithm compatible with any credit assignment mechanism that satisfies the IGM condition while enjoying the benefits of maximum entropy exploration. We empirically evaluate two variants of ME-IGM: ME-QMIX and ME-QPLEX, in non-monotonic matrix games, and demonstrate their state-of-the-art performance across 17 scenarios in SMAC-v2 and Overcooked.

View on arXiv PDF Code

Similar