LGAIROJul 3, 2022

USHER: Unbiased Sampling for Hindsight Experience Replay

arXiv:2207.01115v16 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses a known bias issue in HER for RL practitioners, but it is incremental as it builds directly on HER.

The paper tackles the problem of biased value functions in Hindsight Experience Replay (HER) for sparse-reward reinforcement learning by proposing an asymptotically unbiased importance-sampling-based algorithm, showing effectiveness on robotic systems including high-dimensional stochastic environments.

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes