LGAIMLMay 20, 2018

Nonlinear Distributional Gradient Temporal-Difference Learning

arXiv:1805.07732v317 citations
Originality Incremental advance
AI Analysis

This work addresses reinforcement learning practitioners by combining distributional RL with gradient TD methods, offering incremental improvements in efficiency and convergence for neural network-based applications.

The paper tackled the problem of policy evaluation and control in reinforcement learning by developing distributional variants of gradient temporal-difference algorithms, such as distributional GTD2 and TDC, which converge to local optimal solutions for general smooth function approximators like neural networks, with computational complexity linear in the number of parameters.

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In the policy evaluation setting, we design two new algorithms called distributional GTD2 and distributional TDC using the Cram{é}r distance on the distributional version of the Bellman error objective function, which inherits advantages of both the nonlinear gradient TD algorithms and the distributional RL approach. In the control setting, we propose the distributional Greedy-GQ using the similar derivation. We prove the asymptotic almost-sure convergence of distributional GTD2 and TDC to a local optimal solution for general smooth function approximators, which includes neural networks that have been widely used in recent study to solve the real-life RL problems. In each step, the computational complexities of above three algorithms are linear w.r.t.\ the number of the parameters of the function approximator, thus can be implemented efficiently for neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes