LGROSep 26, 2022

Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

arXiv:2209.13048v13 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses the problem of sparse rewards in meta-RL for real-world applications like robotics, though it is incremental as it builds on existing meta-RL methods by incorporating demonstrations.

The paper tackles the challenge of meta reinforcement learning in sparse reward environments by developing EMRLD algorithms that use sub-optimal demonstrations to guide training, resulting in significant performance improvements over existing approaches in tasks like mobile robot navigation.

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations (EMRLD) that exploit this information even if sub-optimal to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes