The universal approximation power of finite-width deep ReLU networks
This work provides theoretical foundations for deep learning by showing the efficiency of deep networks in approximating complex functions, which is incremental but important for understanding neural network capabilities.
The paper demonstrates that finite-width deep ReLU networks achieve rate-distortion optimal approximation for diverse signal structures, including polynomials, fractals, and oscillatory textures, with approximation error decaying exponentially in the number of neurons.
We show that finite-width deep ReLU neural networks yield rate-distortion optimal approximation (Bölcskei et al., 2018) of polynomials, windowed sinusoidal functions, one-dimensional oscillatory textures, and the Weierstrass function, a fractal function which is continuous but nowhere differentiable. Together with their recently established universal approximation property of affine function systems (Bölcskei et al., 2018), this shows that deep neural networks approximate vastly different signal structures generated by the affine group, the Weyl-Heisenberg group, or through warping, and even certain fractals, all with approximation error decaying exponentially in the number of neurons. We also prove that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks.