LGAIDec 7, 2021

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

arXiv:2112.04907v141 citations
Originality Highly original
AI Analysis

This addresses the problem of sample inefficiency in reinforcement learning for complex, open-world games, which is incremental as it builds on hierarchical methods with new techniques.

The paper tackles the challenge of learning rational behaviors in open-world games like Minecraft by proposing JueWu-MC, a sample-efficient hierarchical reinforcement learning approach that won the NeurIPS MineRL 2021 championship and achieved the highest performance score ever.

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, high-dimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes