LGAINov 14, 2022

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

arXiv:2211.07675v23 citationsh-index: 43
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for deep reinforcement learning algorithms, addressing a gap in understanding for researchers and practitioners in AI.

The paper tackles the theoretical understanding of deep Q-learning by analyzing Fitted Q-Iteration with a two-layer ReLU neural network, achieving an order-optimal sample complexity of σ(1/ε²) for countable state-spaces without structural MDP assumptions.

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/ε^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes