LGMLOct 30, 2019

Policy Continuation with Hindsight Inverse Dynamics

arXiv:1910.14055v243 citations
Originality Incremental advance
AI Analysis

This addresses the problem of sparse rewards in RL for goal-oriented tasks, offering an incremental improvement over existing methods.

The paper tackles the challenge of sparse rewards in goal-oriented reinforcement learning tasks by proposing Policy Continuation with Hindsight Inverse Dynamics (PCHID), which improves sample efficiency and final performance on GridWorld and FetchReach tasks.

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms. On two multi-goal tasks GridWorld and FetchReach, PCHID significantly improves the sample efficiency as well as the final performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes