Optimal approximation of continuous functions by very deep ReLU networks
This provides foundational theoretical insights into the trade-offs between depth and approximation efficiency in neural networks, with implications for all of ML/AI.
The paper tackles the problem of approximating continuous functions with deep ReLU networks, establishing a phase diagram of feasible rates and proving that constant-width networks with depth scaling linearly in the number of weights achieve the fastest possible approximation rate, specifically an error bound of O(ω_f(O(W^{-2/ν}))).
We consider approximations of general continuous functions on finite-dimensional cubes by general deep ReLU neural networks and study the approximation rates with respect to the modulus of continuity of the function and the total number of weights $W$ in the network. We establish the complete phase diagram of feasible approximation rates and show that it includes two distinct phases. One phase corresponds to slower approximations that can be achieved with constant-depth networks and continuous weight assignments. The other phase provides faster approximations at the cost of depths necessarily growing as a power law $L\sim W^α, 0<α\le 1,$ and with necessarily discontinuous weight assignments. In particular, we prove that constant-width fully-connected networks of depth $L\sim W$ provide the fastest possible approximation rate $\|f-\widetilde f\|_\infty = O(ω_f(O(W^{-2/ν})))$ that cannot be achieved with less deep networks.