SYAILGSep 26, 2023

Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization

arXiv:2309.14727v118 citationsh-index: 49
Originality Highly original
AI Analysis

This addresses sample efficiency and policy inconsistency problems for multi-agent control systems, representing a strong incremental improvement to existing MARL methods.

The paper tackles the problem of limited capability and sample efficiency in multi-agent reinforcement learning by proposing MACDPP, which introduces relative entropy regularization to a CTDE framework. The method demonstrates significant superiority in learning capability and sample efficiency compared to multi-agent and single-agent baselines on OpenAI benchmarks and robot arm manipulation tasks.

In this paper, a novel Multi-agent Reinforcement Learning (MARL) approach, Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents. It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Centralized Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and competition tasks and traditional control tasks including OpenAI benchmarks and robot arm manipulation, MACDPP demonstrates significant superiority in learning capability and sample efficiency compared with both related multi-agent and widely implemented signal-agent baselines and therefore expands the potential of MARL in effectively learning challenging control scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes