Abstract Value Iteration for Hierarchical Reinforcement Learning
This addresses control problems in continuous domains for reinforcement learning practitioners, offering a novel framework with theoretical and practical improvements.
The authors tackled hierarchical reinforcement learning in continuous spaces by learning options between user-specified subgoal regions and planning in an abstract decision process, with two algorithms addressing non-Markov challenges. Their approach outperformed state-of-the-art methods on several benchmarks.
We propose a novel hierarchical reinforcement learning framework for control with continuous state and action spaces. In our framework, the user specifies subgoal regions which are subsets of states; then, we (i) learn options that serve as transitions between these subgoal regions, and (ii) construct a high-level plan in the resulting abstract decision process (ADP). A key challenge is that the ADP may not be Markov, which we address by proposing two algorithms for planning in the ADP. Our first algorithm is conservative, allowing us to prove theoretical guarantees on its performance, which help inform the design of subgoal regions. Our second algorithm is a practical one that interweaves planning at the abstract level and learning at the concrete level. In our experiments, we demonstrate that our approach outperforms state-of-the-art hierarchical reinforcement learning algorithms on several challenging benchmarks.