LGSep 28, 2024

Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training

arXiv:2409.19391v12 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the problem of high computational costs for researchers and practitioners in multi-agent systems, representing an incremental improvement by adapting dynamic sparse training to MARL.

The paper tackles the computational overhead in deep multi-agent reinforcement learning by proposing a Multi-Agent Sparse Training (MAST) framework, which reduces redundancy by up to 20x in FLOPs with less than 3% performance degradation.

Deep Multi-agent Reinforcement Learning (MARL) relies on neural networks with numerous parameters in multi-agent scenarios, often incurring substantial computational overhead. Consequently, there is an urgent need to expedite training and enable model compression in MARL. This paper proposes the utilization of dynamic sparse training (DST), a technique proven effective in deep supervised learning tasks, to alleviate the computational burdens in MARL training. However, a direct adoption of DST fails to yield satisfactory MARL agents, leading to breakdowns in value learning within deep sparse value-based MARL models. Motivated by this challenge, we introduce an innovative Multi-Agent Sparse Training (MAST) framework aimed at simultaneously enhancing the reliability of learning targets and the rationality of sample distribution to improve value learning in sparse models. Specifically, MAST incorporates the Soft Mellowmax Operator with a hybrid TD-($λ$) schema to establish dependable learning targets. Additionally, it employs a dual replay buffer mechanism to enhance the distribution of training samples. Building upon these aspects, MAST utilizes gradient-based topology evolution to exclusively train multiple MARL agents using sparse networks. Our comprehensive experimental investigation across various value-based MARL algorithms on multiple benchmarks demonstrates, for the first time, significant reductions in redundancy of up to $20\times$ in Floating Point Operations (FLOPs) for both training and inference, with less than $3\%$ performance degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes