LGJun 12, 2022

Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

Kushal Chauhan, Soumya Chatterjee, Akash Reddy, Balaraman Ravindran, Pradeep Shenoy

arXiv:2206.05750v11.8h-index: 50

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient option reuse for continual learning agents, though it is incremental as it builds on existing hierarchical reinforcement learning frameworks.

The paper tackled the problem of reusing prelearned options across different tasks in hierarchical reinforcement learning by proposing an option indexing approach that learns an affinity function between options and environment items, achieving performance competitive with oracular baselines and substantial gains over a baseline with full option access.

The options framework in Hierarchical Reinforcement Learning breaks down overall goals into a combination of options or simpler tasks and associated policies, allowing for abstraction in the action space. Ideally, these options can be reused across different higher-level goals; indeed, such reuse is necessary to realize the vision of a continual learning agent that can effectively leverage its prior experience. Previous approaches have only proposed limited forms of transfer of prelearned options to new task settings. We propose a novel option indexing approach to hierarchical learning (OI-HRL), where we learn an affinity function between options and the items present in the environment. This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time, by restricting goal-directed learning to only those options relevant to the task at hand. We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems, by incorporating feedback about the relevance of retrieved options to the higher-level goal. We evaluate OI-HRL in two simulated settings - the CraftWorld and AI2THOR environments - and show that we achieve performance competitive with oracular baselines, and substantial gains over a baseline that has the entire option pool available for learning the hierarchical policy.

View on arXiv PDF

Similar