AILGMAMay 20, 2022

Learning Progress Driven Multi-Agent Curriculum

arXiv:2205.10016v32 citationsh-index: 20
Originality Highly original
AI Analysis

This work addresses the challenge of curriculum learning in multi-agent systems, which is incremental as it builds on existing methods by introducing a novel measure to improve automatic curriculum design.

The paper tackled the problem of automatically controlling task difficulty in multi-agent reinforcement learning by using the number of agents as a curriculum variable, proposing a TD-error based learning progress measure to address issues with high variance and credit assignment, and achieved outperformance over state-of-the-art baselines in three sparse-reward benchmarks.

The number of agents can be an effective curriculum variable for controlling the difficulty of multi-agent reinforcement learning (MARL) tasks. Existing work typically uses manually defined curricula such as linear schemes. We identify two potential flaws while applying existing reward-based automatic curriculum learning methods in MARL: (1) The expected episode return used to measure task difficulty has high variance; (2) Credit assignment difficulty can be exacerbated in tasks where increasing the number of agents yields higher returns which is common in many MARL tasks. To address these issues, we propose to control the curriculum by using a TD-error based *learning progress* measure and by letting the curriculum proceed from an initial context distribution to the final task specific one. Since our approach maintains a distribution over the number of agents and measures learning progress rather than absolute performance, which often increases with the number of agents, we alleviate problem (2). Moreover, the learning progress measure naturally alleviates problem (1) by aggregating returns. In three challenging sparse-reward MARL benchmarks, our approach outperforms state-of-the-art baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes