Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning
This addresses the problem of slow learning in reinforcement learning for robotics and gaming, offering a domain-specific improvement.
The paper tackles poor sample efficiency in deep reinforcement learning by introducing the HyPE algorithm, which discovers objects, generates hypotheses about controllability, and learns hierarchical skills, resulting in learning high-scoring policies an order of magnitude faster than state-of-the-art methods in simulated robotic block-pushing and Breakout domains.
Deep reinforcement learning (DRL) is capable of learning high-performing policies on a variety of complex high-dimensional tasks, ranging from video games to robotic manipulation. However, standard DRL methods often suffer from poor sample efficiency, partially because they aim to be entirely problem-agnostic. In this work, we introduce a novel approach to exploration and hierarchical skill learning that derives its sample efficiency from intuitive assumptions it makes about the behavior of objects both in the physical world and simulations which mimic physics. Specifically, we propose the Hypothesis Proposal and Evaluation (HyPE) algorithm, which discovers objects from raw pixel data, generates hypotheses about the controllability of observed changes in object state, and learns a hierarchy of skills to test these hypotheses. We demonstrate that HyPE can dramatically improve the sample efficiency of policy learning in two different domains: a simulated robotic block-pushing domain, and a popular benchmark task: Breakout. In these domains, HyPE learns high-scoring policies an order of magnitude faster than several state-of-the-art reinforcement learning methods.