LGMLMar 12, 2020

Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft

arXiv:2003.06066v128 citations
Originality Incremental advance
AI Analysis

This addresses sample inefficiency for real-world reinforcement learning applications, though it is incremental as it builds on existing demonstration-based methods.

The paper tackled sample inefficiency in deep reinforcement learning by using human demonstrations to improve agent performance in the Minecraft minigame ObtainDiamond, achieving a mean score of 48 with only 8M frames of environment interaction and placing 3rd in the NeurIPS MineRL Competition.

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes