AILGJan 3, 2025

Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning

arXiv:2501.01727v1ICAART
Originality Incremental advance
AI Analysis

This addresses the problem of sample inefficiency in multi-goal reinforcement learning for humanoid robots, though it appears incremental as it builds on existing methods like GCPs and MCTS.

The paper tackles the challenge of training humanoid robots on numerous tasks with sparse rewards by proposing a hierarchical goal-conditioned policy planning framework that combines reinforcement learning and automated planning, resulting in enhanced sample efficiency and faster reasoning.

Humanoid robots must master numerous tasks with sparse rewards, posing a challenge for reinforcement learning (RL). We propose a method combining RL and automated planning to address this. Our approach uses short goal-conditioned policies (GCPs) organized hierarchically, with Monte Carlo Tree Search (MCTS) planning using high-level actions (HLAs). Instead of primitive actions, the planning process generates HLAs. A single plan-tree, maintained during the agent's lifetime, holds knowledge about goal achievement. This hierarchy enhances sample efficiency and speeds up reasoning by reusing HLAs and anticipating future actions. Our Hierarchical Goal-Conditioned Policy Planning (HGCPP) framework uniquely integrates GCPs, MCTS, and hierarchical RL, potentially improving exploration and planning in complex tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes