LGAIMAAug 19, 2023

Never Explore Repeatedly in Multi-Agent Reinforcement Learning

HarvardTsinghua
arXiv:2308.09909v11 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses exploration inefficiencies in multi-agent reinforcement learning, particularly for sparse-reward tasks, but appears incremental as it builds on existing intrinsic motivation methods.

The paper tackles the 'revisitation' problem in multi-agent reinforcement learning, where agents repeatedly explore limited areas due to neural network limitations, by proposing a dynamic reward scaling method that stabilizes intrinsic rewards and promotes broader exploration, resulting in improved performance in challenging environments like Google Research Football and StarCraft II with sparse rewards.

In the realm of multi-agent reinforcement learning, intrinsic motivations have emerged as a pivotal tool for exploration. While the computation of many intrinsic rewards relies on estimating variational posteriors using neural network approximators, a notable challenge has surfaced due to the limited expressive capability of these neural statistics approximators. We pinpoint this challenge as the "revisitation" issue, where agents recurrently explore confined areas of the task space. To combat this, we propose a dynamic reward scaling approach. This method is crafted to stabilize the significant fluctuations in intrinsic rewards in previously explored areas and promote broader exploration, effectively curbing the revisitation phenomenon. Our experimental findings underscore the efficacy of our approach, showcasing enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks, especially in sparse reward settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes