LGROMLNov 3, 2019

Learning from Trajectories via Subgoal Discovery

arXiv:1911.07224v151 citations
Originality Incremental advance
AI Analysis

This addresses the sample inefficiency in reinforcement learning for sparse-reward tasks, offering a novel hybrid approach that could benefit robotics and autonomous systems, though it appears incremental in combining existing techniques.

The paper tackles the problem of learning complex goal-oriented tasks with sparse rewards by decomposing expert trajectories into sub-goals, enabling a switch from imitation learning to reinforcement learning, and achieves success on tasks where prior methods fail.

Learning to solve complex goal-oriented tasks with sparse terminal-only rewards often requires an enormous number of samples. In such cases, using a set of expert trajectories could help to learn faster. However, Imitation Learning (IL) via supervised pre-training with these trajectories may not perform as well and generally requires additional finetuning with expert-in-the-loop. In this paper, we propose an approach which uses the expert trajectories and learns to decompose the complex main task into smaller sub-goals. We learn a function which partitions the state-space into sub-goals, which can then be used to design an extrinsic reward function. We follow a strategy where the agent first learns from the trajectories using IL and then switches to Reinforcement Learning (RL) using the identified sub-goals, to alleviate the errors in the IL step. To deal with states which are under-represented by the trajectory set, we also learn a function to modulate the sub-goal predictions. We show that our method is able to solve complex goal-oriented tasks, which other RL, IL or their combinations in literature are not able to solve.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes