Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus

arXiv:2604.1347266.7h-index: 7Has Code

AI Analysis

This work addresses the challenges of non-stationarity and coordination in cooperative MARL by reformulating it as a single-agent problem, offering a new perspective for centralized training.

The paper proposes CMAT, a centralized framework that bridges cooperative MARL to hierarchical SARL by using a Transformer to generate a latent consensus vector, enabling order-independent joint action generation. CMAT achieves superior performance over recent baselines on StarCraft II, Multi-Agent MuJoCo, and Google Research Football.

Cooperative multi-agent reinforcement learning (MARL) is widely used to address large joint observation and action spaces by decomposing a centralized control problem into multiple interacting agents. However, such decomposition often introduces additional challenges, including non-stationarity, unstable training, weak coordination, and limited theoretical guarantees. In this paper, we propose the Consensus Multi-Agent Transformer (CMAT), a centralized framework that bridges cooperative MARL to a hierarchical single-agent reinforcement learning (SARL) formulation. CMAT treats all agents as a unified entity and employs a Transformer encoder to process the large joint observation space. To handle the extensive joint action space, we introduce a hierarchical decision-making mechanism in which a Transformer decoder autoregressively generates a high-level consensus vector, simulating the process by which agents reach agreement on their strategies in latent space. Conditioned on this consensus, all agents generate their actions simultaneously, enabling order-independent joint decision making and avoiding the sensitivity to action-generation order in conventional Multi-Agent Transformers (MAT). This factorization allows the joint policy to be optimized using single-agent PPO while preserving expressive coordination through the latent consensus. To evaluate the proposed method, we conduct experiments on benchmark tasks from StarCraft II, Multi-Agent MuJoCo, and Google Research Football. The results show that CMAT achieves superior performance over recent centralized solutions, sequential MARL methods, and conventional MARL baselines. The code for this paper is available at:https://github.com/RS2002/CMAT .

View on arXiv PDF Code

Similar