LGAIMLMay 9, 2018

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

arXiv:1805.03359v245 citations
Originality Incremental advance
AI Analysis

This addresses robustness issues in model-free RL, particularly for applications like robotics where reward specification is challenging, though it is an incremental improvement over existing variance reduction techniques.

The paper tackles the problem of high variance in deep reinforcement learning caused by corrupt or stochastic rewards, proposing a reward and value function estimator that improves performance across various noise types and environments.

Reinforcement Learning (RL) agents require the specification of a reward signal for learning behaviours. However, introduction of corrupt or stochastic rewards can yield high variance in learning. Such corruption may be a direct result of goal misspecification, randomness in the reward signal, or correlation of the reward with external factors that are not known to the agent. Corruption or stochasticity of the reward signal can be especially problematic in robotics, where goal specification can be particularly difficult for complex tasks. While many variance reduction techniques have been studied to improve the robustness of the RL process, handling such stochastic or corrupted reward structures remains difficult. As an alternative for handling this scenario in model-free RL methods, we suggest using an estimator for both rewards and value functions. We demonstrate that this improves performance under corrupted stochastic rewards in both the tabular and non-linear function approximation settings for a variety of noise types and environments. The use of reward estimation is a robust and easy-to-implement improvement for handling corrupted reward signals in model-free RL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes