RO AIAug 1, 2022

Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards

Yongle Luo, Yuxin Wang, Kun Dong, Qiang Zhang, Erkang Cheng, Zhiyong Sun, Bo Song

arXiv:2208.00843v214.238 citationsh-index: 91Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of low exploration efficiency in reinforcement learning for sequential manipulation tasks, offering a novel method that is incremental over existing techniques like Hindsight Experience Replay.

The paper tackles the challenge of exploration in sequential object manipulation tasks with sparse rewards by proposing RelayHER (RHER), a self-guided continual RL framework that decomposes tasks into sub-tasks and uses learned policies to guide exploration, resulting in significant improvements in sample efficiency and achieving a 10/10 success rate in a physical robot task with only 250 episodes.

Exploration with sparse rewards remains a challenging research problem in reinforcement learning (RL). Especially for sequential object manipulation tasks, the RL agent always receives negative rewards until completing all sub-tasks, which results in low exploration efficiency. To solve these tasks efficiently, we propose a novel self-guided continual RL framework, RelayHER (RHER). RHER first decomposes a sequential task into new sub-tasks with increasing complexity and ensures that the simplest sub-task can be learned quickly by utilizing Hindsight Experience Replay (HER). Secondly, we design a multi-goal & multi-task network to learn these sub-tasks simultaneously. Finally, we propose a Self-Guided Exploration Strategy (SGES). With SGES, the learned sub-task policy will guide the agent to the states that are helpful to learn more complex sub-task with HER. By this self-guided exploration and relay policy learning, RHER can solve these sequential tasks efficiently stage by stage. The experimental results show that RHER significantly outperforms vanilla-HER in sample-efficiency on five singleobject and five complex multi-object manipulation tasks (e.g., Push, Insert, ObstaclePush, Stack, TStack, etc.). The proposed RHER has also been applied to learn a contact-rich push task on a physical robot from scratch, and the success rate reached 10/10 with only 250 episodes.

View on arXiv PDF Code

Similar