LGAIOCMay 7, 2024

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

arXiv:2405.04017v13 citationsh-index: 4ICML
Originality Incremental advance
AI Analysis

This work provides a theoretical foundation for reinforcement learning practitioners using neural TD methods, though it is incremental as it improves upon existing bounds.

The paper tackles the theoretical challenge of analyzing temporal difference learning with deep neural networks by developing an improved non-asymptotic analysis, achieving a sample complexity of $ ilde{\mathcal{O}}(\epsilon^{-1})$ under Markovian sampling, compared to the previous best of $ ilde{\mathcal{O}}(\epsilon^{-2})$.

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $\tilde{\mathcal{O}}(ε^{-1})$ sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an $\tilde{\mathcal{O}}(ε^{-1})$ complexity under the Markovian sampling, as opposed to the best known $\tilde{\mathcal{O}}(ε^{-2})$ complexity in the existing literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes