LGAIROMLFeb 25, 2020

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

arXiv:2002.11089v195 citations
AI Analysis

This work addresses the challenge of efficient policy learning in multi-task RL settings, offering a generalization of existing methods but is incremental in nature.

The paper tackles the problem of improving sample efficiency in multi-task reinforcement learning by showing that hindsight relabeling is equivalent to inverse RL, and demonstrates that using inverse RL for data relabeling accelerates learning across various task types, including goal-reaching and domains with discrete or linear rewards.

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes