LGAINERONov 16, 2017

Hindsight policy gradients

arXiv:1711.06006v375 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of sparse rewards for reinforcement learning agents, enabling more efficient goal-conditional policy learning, though it is incremental as it extends existing methods.

The paper tackles the problem of sample-efficient learning in sparse-reward reinforcement learning environments by introducing hindsight to policy gradient methods, resulting in a remarkable increase in sample efficiency as demonstrated in diverse experiments.

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes