LG MLOct 30, 2019

Policy Continuation with Hindsight Inverse Dynamics

Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou

arXiv:1910.14055v216.143 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of sparse rewards in RL for goal-oriented tasks, offering an incremental improvement over existing methods.

The paper tackles the challenge of sparse rewards in goal-oriented reinforcement learning tasks by proposing Policy Continuation with Hindsight Inverse Dynamics (PCHID), which improves sample efficiency and final performance on GridWorld and FetchReach tasks.

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms. On two multi-goal tasks GridWorld and FetchReach, PCHID significantly improves the sample efficiency as well as the final performance.

View on arXiv PDF Code

Similar