OCLGNAPRMLJun 28, 2025

Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

arXiv:2506.22851v11 citationsh-index: 50
Originality Highly original
AI Analysis

This provides a theoretical foundation for using deep learning in reinforcement learning, addressing a fundamental scalability issue for high-dimensional control problems.

The paper tackles the curse of dimensionality in solving Bellman equations for Markov decision processes by proving that deep neural networks with leaky ReLU activation can approximate Q-functions with polynomial parameter growth in state dimension and error tolerance.

Discrete time stochastic optimal control problems and Markov decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty and as such provide the mathematical framework underlying reinforcement learning theory. A central tool for solving MDPs is the Bellman equation and its solution, the so-called $Q$-function. In this article, we construct deep neural network (DNN) approximations for $Q$-functions associated to MDPs with infinite time horizon and finite control set $A$. More specifically, we show that if the the payoff function and the random transition dynamics of the MDP can be suitably approximated by DNNs with leaky rectified linear unit (ReLU) activation, then the solutions $Q_d\colon \mathbb R^d\to \mathbb R^{|A|}$, $d\in \mathbb{N}$, of the associated Bellman equations can also be approximated in the $L^2$-sense by DNNs with leaky ReLU activation whose numbers of parameters grow at most polynomially in both the dimension $d\in \mathbb{N}$ of the state space and the reciprocal $1/\varepsilon$ of the prescribed error $\varepsilon\in (0,1)$. Our proof relies on the recently introduced full-history recursive multilevel fixed-point (MLFP) approximation scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes