ROAILGFeb 4, 2025

DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

arXiv:2502.01956v2h-index: 3
Originality Highly original
AI Analysis

This addresses the challenge of error-prone planning for HRL agents in complex visual environments, representing a strong incremental improvement over existing methods.

The paper tackles the problem of long-horizon visual planning in Hierarchical Reinforcement Learning by proposing Discrete Hierarchical Planning (DHP), which replaces continuous distance estimates with discrete reachability checks, achieving a 100% success rate (vs 82% baseline) and 73-step average episode length (vs 158-step baseline) in 25-room navigation environments.

Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate $100\%$ success rate (vs $82\%$ baseline) and $73$-step average episode length (vs $158$-step baseline). The method also generalizes to momentum-based control tasks and requires only $\log N$ steps for replanning. Theoretical analysis and ablations validate our design choices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes