LG AI MAOct 8, 2023

FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

Lang Feng, Dong Xing, Junru Zhang, Gang Pan

arXiv:2310.05053v12.02 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses a specific technical bottleneck in multi-agent reinforcement learning for researchers, though it appears incremental as it builds on existing PPO methods.

The paper tackles the limitation of existing multi-agent PPO algorithms in handling different parameter-sharing types in cooperative MARL, proposing FP3O, which outperforms baselines on Multi-Agent MuJoCo and StarCraftII tasks.

Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several approximations. Empirical evaluations on Multi-Agent MuJoCo and StarCraftII tasks demonstrate that FP3O outperforms other strong baselines and exhibits remarkable versatility across various parameter-sharing configurations.

View on arXiv PDF

Similar