Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks
This work offers a foundational framework connecting controllability, expressivity, and training dynamics in deep learning, with implications for understanding overparameterization and regularization.
The paper tackles the problem of exact classification and function approximation using deep ReLU networks by proving that any dataset with N points and M classes can be exactly classified with a width-2 network of depth at most 2N + 4M - 1, and provides a constructive universal approximation theorem in L^p spaces with width d+1 and explicit depth bounds for Sobolev functions.
We present a fully constructive analysis of deep ReLU neural networks for classification and function approximation tasks. First, we prove that any dataset with $N$ distinct points in $\mathbb{R}^d$ and $M$ output classes can be exactly classified using a multilayer perceptron (MLP) of width $2$ and depth at most $2N + 4M - 1$, with all network parameters constructed explicitly. This result is sharp with respect to width and is interpreted through the lens of simultaneous or ensemble controllability in discrete nonlinear dynamics. Second, we show that these explicit constructions yield uniform bounds on the parameter norms and, in particular, provide upper estimates for minimizers of standard regularized training loss functionals in supervised learning. As the regularization parameter vanishes, the trained networks converge to exact classifiers with bounded norm, explaining the effectiveness of overparameterized training in the small-regularization regime. We also prove a universal approximation theorem in $L^p(Ω; \mathbb{R}_+)$ for any bounded domain $Ω\subset \mathbb{R}^d$ and $p \in [1, \infty)$, using MLPs of fixed width $d + 1$. The proof is constructive, geometrically motivated, and provides explicit estimates on the network depth when the target function belongs to the Sobolev space $W^{1,p}$. We also extend the approximation and depth estimation results to $L^p(Ω; \mathbb{R}^m)$ for any $m \geq 1$. Our results offer a unified and interpretable framework connecting controllability, expressivity, and training dynamics in deep neural networks.