Hierarchical Linearly-Solvable Markov Decision Problems
This work addresses the challenge of scaling reinforcement learning to complex, hierarchical tasks for applications such as robotics and autonomous systems, though it appears incremental by building on existing LMDP frameworks.
The paper tackles the problem of hierarchical reinforcement learning in large state spaces by formulating tasks as linearly-solvable Markov decision processes (LMDPs) with analytical solutions, and it shows that the proposed hierarchical Z-learning algorithm significantly outperforms state-of-the-art methods in domains like the taxi and autonomous guided vehicle tasks.
We present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal value function or task compositionality. The proposed hierarchical approach can also be seen as a novel alternative to solving LMDPs with large state spaces. We derive a hierarchical version of the so-called Z-learning algorithm that learns different tasks simultaneously and show empirically that it significantly outperforms the state-of-the-art learning methods in two classical hierarchical reinforcement learning domains: the taxi domain and an autonomous guided vehicle task.