LGDec 8, 2023

On the Performance of Temporal Difference Learning With Neural Networks

Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky

arXiv:2312.05397v19.88 citationsh-index: 41ICLR

Originality Synthesis-oriented

AI Analysis

This work addresses a theoretical gap in reinforcement learning for researchers, but it is incremental as it builds on existing methods with a specific projection technique.

The paper tackles the challenge of analyzing Neural Temporal Difference Learning by providing a convergence analysis with a projection onto a fixed-radius ball around the initial parameters, showing an approximation bound of O(ε) + Õ(1/√m) where ε is the approximation quality and m is the network width.

Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto $B(θ_0, ω)$, a ball of fixed radius $ω$ around the initial point $θ_0$. We show an approximation bound of $O(ε) + \tilde{O} (1/\sqrt{m})$ where $ε$ is the approximation quality of the best neural network in $B(θ_0, ω)$ and $m$ is the width of all hidden layers in the network.

View on arXiv PDF

Similar