LG AI MA RO SYApr 5, 2024

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

arXiv:2404.03869v211 citationsh-index: 9Neurocomputing

Originality Incremental advance

AI Analysis

This work addresses the problem of scalable collaboration in dynamic multi-agent systems, such as autonomous vehicle networks, for researchers and practitioners in MARL, though it appears incremental as it builds on existing PPO-based methods.

The authors tackled the challenge of achieving zero-shot scalable collaboration in multi-agent systems with fluctuating scales and multiple roles by proposing SHPPO, a novel MARL framework that integrates heterogeneity into parameter-shared PPO networks, resulting in superior performance in environments like SMAC and GRF with enhanced scalability.

The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. We first leverage a latent network to learn strategy patterns for each agent adaptively. Second, we introduce a heterogeneous layer to be inserted into decision-making networks, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity, allowing SHPPO to adapt effectively to varying scales. SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability, and offering insights into the learned latent variables' impact on team performance by visualization.

View on arXiv PDF

Similar