LGAIROFeb 25, 2021

Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning

arXiv:2102.12962v38 citations
AI Analysis

This work addresses sample inefficiency in multi-goal reinforcement learning for applications like robot manipulation, but it is incremental as it builds on existing HER techniques.

The paper tackles the challenges of sparse rewards and sample inefficiency in multi-goal reinforcement learning by proposing Multi-step Hindsight Experience Replay (MHER) with bias-reduced algorithms, achieving significantly higher sample efficiency than existing methods like HER and Curriculum-guided HER in robotic tasks.

Multi-goal reinforcement learning is widely applied in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges via goal relabeling. However, HER-related works still need millions of samples and a huge computation. In this paper, we propose Multi-step Hindsight Experience Replay (MHER), incorporating multi-step relabeled returns based on $n$-step relabeling to improve sample efficiency. Despite the advantages of $n$-step relabeling, we theoretically and experimentally prove the off-policy $n$-step bias introduced by $n$-step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER($λ$) and Model-based MHER (MMHER) are presented. MHER($λ$) exploits the $λ$ return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy $n$-step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes