MA AIDec 12, 2024

GTDE: Grouped Training with Decentralized Execution for Multi-agent Actor-Critic

arXiv:2501.10367v13.35 citationsh-index: 2AAAI

Originality Incremental advance

AI Analysis

This addresses scalability issues in large-scale multi-agent systems for researchers and practitioners, though it is incremental as it builds on existing CTDE and DTDE paradigms.

The paper tackles the performance degradation in multi-agent reinforcement learning as the number of agents increases by proposing GTDE, a grouped training with decentralized execution paradigm, which increased total reward by 382% in a cooperative environment with 495 agents and achieved a 100% win rate in a competitive environment with 64 agents.

The rapid advancement of multi-agent reinforcement learning (MARL) has given rise to diverse training paradigms to learn the policies of each agent in the multi-agent system. The paradigms of decentralized training and execution (DTDE) and centralized training with decentralized execution (CTDE) have been proposed and widely applied. However, as the number of agents increases, the inherent limitations of these frameworks significantly degrade the performance metrics, such as win rate, total reward, etc. To reduce the influence of the increasing number of agents on the performance metrics, we propose a novel training paradigm of grouped training decentralized execution (GTDE). This framework eliminates the need for a centralized module and relies solely on local information, effectively meeting the training requirements of large-scale multi-agent systems. Specifically, we first introduce an adaptive grouping module, which divides each agent into different groups based on their observation history. To implement end-to-end training, GTDE uses Gumbel-Sigmoid for efficient point-to-point sampling on the grouping distribution while ensuring gradient backpropagation. To adapt to the uncertainty in the number of members in a group, two methods are used to implement a group information aggregation module that merges member information within the group. Empirical results show that in a cooperative environment with 495 agents, GTDE increased the total reward by an average of 382\% compared to the baseline. In a competitive environment with 64 agents, GTDE achieved a 100\% win rate against the baseline.

View on arXiv PDF

Similar