LG AI ROFeb 25, 2021

Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning

Rui Yang, Jiafei Lyu, Yu Yang, Jiangpeng Yan, Feng Luo, Dijun Luo, Lanqing Li, Xiu Li

arXiv:2102.12962v39.28 citations

Originality Incremental advance

AI Analysis

This work addresses sample inefficiency in multi-goal reinforcement learning for applications like robot manipulation, but it is incremental as it builds on existing HER techniques.

The paper tackles the challenges of sparse rewards and sample inefficiency in multi-goal reinforcement learning by proposing Multi-step Hindsight Experience Replay (MHER) with bias-reduced algorithms, achieving significantly higher sample efficiency than existing methods like HER and Curriculum-guided HER in robotic tasks.

Multi-goal reinforcement learning is widely applied in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges via goal relabeling. However, HER-related works still need millions of samples and a huge computation. In this paper, we propose Multi-step Hindsight Experience Replay (MHER), incorporating multi-step relabeled returns based on $n$-step relabeling to improve sample efficiency. Despite the advantages of $n$-step relabeling, we theoretically and experimentally prove the off-policy $n$-step bias introduced by $n$-step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER($λ$) and Model-based MHER (MMHER) are presented. MHER($λ$) exploits the $λ$ return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy $n$-step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.

View on arXiv PDF

Similar