LGMAMLOct 12, 2019

Influence-Based Multi-Agent Exploration

arXiv:1910.05512v1157 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient exploration in multi-agent systems, which is crucial for applications like robotics and game AI, though it appears incremental as it builds on existing intrinsically motivated RL.

The paper tackles the exploration challenge in sparse-reward multi-agent reinforcement learning by proposing two methods, EITI and EDTI, that use influence-based intrinsic rewards to encourage coordinated exploration, resulting in significant performance gains in various multi-agent scenarios.

Intrinsically motivated reinforcement learning aims to address the exploration challenge for sparse-reward tasks. However, the study of exploration methods in transition-dependent multi-agent settings is largely absent from the literature. We aim to take a step towards solving this problem. We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents. EITI uses mutual information to capture influence transition dynamics. EDTI uses a novel intrinsic reward, called Value of Interaction (VoI), to characterize and quantify the influence of one agent's behavior on expected returns of other agents. By optimizing EITI or EDTI objective as a regularizer, agents are encouraged to coordinate their exploration and learn policies to optimize team performance. We show how to optimize these regularizers so that they can be easily integrated with policy gradient reinforcement learning. The resulting update rule draws a connection between coordinated exploration and intrinsic reward distribution. Finally, we empirically demonstrate the significant strength of our method in a variety of multi-agent scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes