LGMLOct 18, 2021

Provable Hierarchy-Based Meta-Reinforcement Learning

arXiv:2110.09507v16 citations
Originality Highly original
AI Analysis

This work addresses a foundational gap in hierarchical reinforcement learning by providing theoretical guarantees for learning hierarchies, which is significant for researchers and practitioners in AI seeking reliable and efficient methods for complex behavior learning.

The paper tackles the lack of provable guarantees in hierarchical reinforcement learning by analyzing it in a meta-RL setting, providing diversity conditions and an algorithm that ensures sample-efficient recovery of latent hierarchical structure and offers regret bounds for downstream tasks.

Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing work either assume access to expert-constructed hierarchies, or use hierarchy-learning heuristics with no provable guarantees. To address this gap, we analyze HRL in the meta-RL setting, where a learner learns latent hierarchical structure during meta-training for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised meta-learning theory, we provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a meta-test task. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes