AIDec 20, 2024

Autonomous Option Invention for Continual Hierarchical Reinforcement Learning and Planning

arXiv:2412.16395v17 citationsh-index: 4AAAI
Originality Incremental advance
AI Analysis

It addresses the problem of scaling reinforcement learning for researchers and practitioners by enabling transfer and generalization in long-horizon, sparse-reward environments, though it appears incremental in building on existing option-based methods.

The paper tackles the challenge of autonomously learning abstract state and action representations in reinforcement learning by inventing options with symbolic representations for continual settings, achieving superior sample efficiency compared to state-of-the-art methods.

Abstraction is key to scaling up reinforcement learning (RL). However, autonomously learning abstract state and action representations to enable transfer and generalization remains a challenging open problem. This paper presents a novel approach for inventing, representing, and utilizing options, which represent temporally extended behaviors, in continual RL settings. Our approach addresses streams of stochastic problems characterized by long horizons, sparse rewards, and unknown transition and reward functions. Our approach continually learns and maintains an interpretable state abstraction, and uses it to invent high-level options with abstract symbolic representations. These options meet three key desiderata: (1) composability for solving tasks effectively with lookahead planning, (2) reusability across problem instances for minimizing the need for relearning, and (3) mutual independence for reducing interference among options. Our main contributions are approaches for continually learning transferable, generalizable options with symbolic representations, and for integrating search techniques with RL to efficiently plan over these learned options to solve new problems. Empirical results demonstrate that the resulting approach effectively learns and transfers abstract knowledge across problem instances, achieving superior sample efficiency compared to state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes