AI LGDec 2, 2021

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Charles Packer, Pieter Abbeel, Joseph E. Gonzalez

arXiv:2112.00901v113.021 citations

Originality Incremental advance

AI Analysis

This addresses a bottleneck in meta-RL for sparse reward settings, offering a more practical approach for real-world applications where rewards are often sparse.

The paper tackles the problem of meta-reinforcement learning struggling in sparse reward environments by proposing hindsight task relabeling, which enables learning entirely with sparse rewards, achieving performance comparable to using dense rewards on challenging goal-reaching tasks.

Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments. Although existing meta-RL algorithms can learn strategies for adapting to new sparse reward tasks, the actual adaptation strategies are learned using hand-shaped reward functions, or require simple environments where random exploration is sufficient to encounter sparse reward. In this paper, we present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward. We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments that previously required dense reward during meta-training to solve. Our approach solves these environments using the true sparse reward function, with performance comparable to training with a proxy dense reward function.

View on arXiv PDF

Similar