LGMLMay 7, 2021

Reward prediction for representation learning and reward shaping

arXiv:2105.03172v13 citations
Originality Incremental advance
AI Analysis

This addresses data efficiency for RL practitioners in sparse-reward, high-dimensional observation settings, but it is incremental as it builds on existing methods.

The paper tackles the data inefficiency problem in reinforcement learning, especially with sparse rewards, by learning a state representation for reward prediction and using it for reward shaping, which significantly enhances Actor Critic and Proximal Policy Optimization in single-goal visual environments.

One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in environment with a single, terminating, goal state. We augment the training of out-of-the-box RL agents by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to significantly enhance Actor Critic using Kronecker-factored Trust Region and Proximal Policy Optimization in single-goal environments with visual inputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes