Provable Hierarchy-Based Meta-Reinforcement Learning
This work addresses a foundational gap in hierarchical reinforcement learning by providing theoretical guarantees for learning hierarchies, which is significant for researchers and practitioners in AI seeking reliable and efficient methods for complex behavior learning.
The paper tackles the lack of provable guarantees in hierarchical reinforcement learning by analyzing it in a meta-RL setting, providing diversity conditions and an algorithm that ensures sample-efficient recovery of latent hierarchical structure and offers regret bounds for downstream tasks.
Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing work either assume access to expert-constructed hierarchies, or use hierarchy-learning heuristics with no provable guarantees. To address this gap, we analyze HRL in the meta-RL setting, where a learner learns latent hierarchical structure during meta-training for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised meta-learning theory, we provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a meta-test task. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.