AILGMay 13, 2021

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

arXiv:2105.06350v127 citations
Originality Incremental advance
AI Analysis

This work addresses sample efficiency for goal-oriented RL tasks, but it appears incremental as it builds on existing hindsight methods with model-based enhancements.

The paper tackles the reward sparsity problem in goal-oriented reinforcement learning by proposing MapGo, a framework that integrates a new relabeling strategy (FGI) and model-generated trajectories, resulting in higher sample efficiency compared to model-free baselines on complex tasks.

In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. In this paper, to enhance the diversity of relabeled goals, we develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. Besides, to improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training. By integrating these two improvements, we introduce the MapGo framework (Model-Assisted Policy Optimization for Goal-oriented tasks). In our experiments, we first show the effectiveness of the FGI strategy compared with the hindsight one, and then show that the MapGo framework achieves higher sample efficiency when compared to model-free baselines on a set of complicated tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes