LG ITDec 28, 2015

Taming the Noise in Reinforcement Learning via Soft Updates

arXiv:1512.08562v434.0372 citations

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in reinforcement learning for noisy environments, offering incremental improvements in convergence rates.

The paper tackles the problem of poor performance in model-free reinforcement learning in noisy environments by proposing G-learning, a new off-policy algorithm that regularizes value estimates to reduce bias, resulting in faster convergence to optimal policies and lower learning costs.

Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to on-policy algorithms. We illustrate these ideas in several examples, where G-learning results in significant improvements of the convergence rate and the cost of the learning process.

View on arXiv PDF

Similar