MALGApr 19, 2019

Teaching on a Budget in Multi-Agent Deep Reinforcement Learning

arXiv:1905.01357v237 citations
Originality Synthesis-oriented
AI Analysis

This work addresses sample efficiency for researchers in cooperative decentralized multi-agent reinforcement learning, but it is incremental as it builds on existing teacher-student frameworks with limited novelty.

The paper tackles the problem of poor sample efficiency in multi-agent deep reinforcement learning by proposing heuristics-based action advising techniques within a teacher-student framework, showing experimental results in a gridworld environment that suggest the approach is useful and warrants further investigation.

Deep Reinforcement Learning (RL) algorithms can solve complex sequential decision tasks successfully. However, they have a major drawback of having poor sample efficiency which can often be tackled by knowledge reuse. In Multi-Agent Reinforcement Learning (MARL) this drawback becomes worse, but at the same time, a new set of opportunities to leverage knowledge are also presented through agent interactions. One promising approach among these is peer-to-peer action advising through a teacher-student framework. Despite being introduced for single-agent RL originally, recent studies show that it can also be applied to multi-agent scenarios with promising empirical results. However, studies in this line of research are currently very limited. In this paper, we propose heuristics-based action advising techniques in cooperative decentralised MARL, using a nonlinear function approximation based task-level policy. By adopting Random Network Distillation technique, we devise a measurement for agents to assess their knowledge in any given state and be able to initiate the teacher-student dynamics with no prior role assumptions. Experimental results in a gridworld environment show that such an approach may indeed be useful and needs to be further investigated.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes