LGAIMLMay 4, 2025

Universal Approximation Theorem of Deep Q-Networks

arXiv:2505.02288v15 citationsh-index: 2ICML
Originality Incremental advance
AI Analysis

This work bridges deep reinforcement learning and stochastic control, offering insights for applications with physical systems or high-frequency data, but it is incremental as it extends existing approximation theorems to a continuous-time context.

The authors tackled the problem of analyzing Deep Q-Networks (DQNs) in continuous-time settings by establishing a framework using stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs), showing that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability.

We establish a continuous-time framework for analyzing Deep Q-Networks (DQNs) via stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs). Considering a continuous-time Markov Decision Process (MDP) driven by a square-integrable martingale, we analyze DQN approximation properties. We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability, leveraging residual network approximation theorems and large deviation bounds for the state-action process. We then analyze the convergence of a general Q-learning algorithm for training DQNs in this setting, adapting stochastic approximation theorems. Our analysis emphasizes the interplay between DQN layer count, time discretization, and the role of viscosity solutions (primarily for the value function $V^*$) in addressing potential non-smoothness of the optimal Q-function. This work bridges deep reinforcement learning and stochastic control, offering insights into DQNs in continuous-time settings, relevant for applications with physical systems or high-frequency data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes