LG OC MLDec 10, 2019

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

arXiv:1912.04511v222.782 citations

Originality Highly original

AI Analysis

This provides theoretical foundations for a widely used deep RL algorithm, addressing a key gap in understanding its finite-time behavior under realistic non-i.i.d. data assumptions.

The paper tackles the lack of non-asymptotic convergence guarantees for Q-learning with neural network function approximation in deep reinforcement learning, proving that the algorithm achieves an O(1/√T) convergence rate to the optimal policy under Markov decision process data and overparameterized ReLU networks.

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

View on arXiv PDF

Similar