LGAINEROSep 28, 2017

Overcoming Exploration in Reinforcement Learning with Demonstrations

arXiv:1709.10089v2882 citations
AI Analysis

This addresses the challenge of making RL practical for real-world robotics tasks with sparse rewards, though it is incremental as it builds on existing methods.

The paper tackles the problem of exploration in reinforcement learning for sparse-reward environments by using demonstrations to learn long-horizon robotics tasks, achieving an order of magnitude speedup over RL alone and often outperforming the demonstrator policy.

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes