AIMar 29, 2016

Algorithms for Batch Hierarchical Reinforcement Learning

arXiv:1603.08869v1
Originality Incremental advance
AI Analysis

This addresses the challenge of data efficiency and flexibility in hierarchical reinforcement learning for AI systems, though it is incremental.

The paper tackles the problem of learning hierarchical policies from a fixed dataset in reinforcement learning, introducing Hierarchical Q-value Iteration (HQI) and showing it converges faster than flat methods and enables model comparison without new data collection.

Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction to solve large Markov Decision Processes (MDP) and provide transferable subtask policies. In this paper, we introduce an off-policy HRL algorithm: Hierarchical Q-value Iteration (HQI). We show that it is possible to effectively learn recursive optimal policies for any valid hierarchical decomposition of the original MDP, given a fixed dataset collected from a flat stochastic behavioral policy. We first formally prove the convergence of the algorithm for tabular MDP. Then our experiments on the Taxi domain show that HQI converges faster than a flat Q-value Iteration and enjoys easy state abstraction. Also, we demonstrate that our algorithm is able to learn optimal policies for different hierarchical structures from the same fixed dataset, which enables model comparison without recollecting data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes