LGAIMAMLMay 28, 2019

Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning

arXiv:1905.12127v358 citations
Originality Highly original
AI Analysis

This addresses the problem of inefficient exploration in multi-agent systems for researchers and practitioners, representing a novel method rather than an incremental improvement.

The paper tackles the challenge of sparse rewards in cooperative multi-agent reinforcement learning by introducing a framework for intrinsic rewards that enable agents to coordinate their exploration, resulting in improved performance in domains where state-of-the-art methods fail.

Solving tasks with sparse rewards is one of the most important challenges in reinforcement learning. In the single-agent setting, this challenge is addressed by introducing intrinsic rewards that motivate agents to explore unseen regions of their state spaces; however, applying these techniques naively to the multi-agent setting results in agents exploring independently, without any coordination among themselves. Exploration in cooperative multi-agent settings can be accelerated and improved if agents coordinate their exploration. In this paper we introduce a framework for designing intrinsic rewards which consider what other agents have explored such that the agents can coordinate. Then, we develop an approach for learning how to dynamically select between several exploration modalities to maximize extrinsic rewards. Concretely, we formulate the approach as a hierarchical policy where a high-level controller selects among sets of policies trained on diverse intrinsic rewards and the low-level controllers learn the action policies of all agents under these specific rewards. We demonstrate the effectiveness of the proposed approach in cooperative domains with sparse rewards where state-of-the-art methods fail and challenging multi-stage tasks that necessitate changing modes of coordination.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes