Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces
This addresses a foundational problem in deep learning for researchers and practitioners by showing that theoretical guarantees often cannot be achieved in practice, highlighting an incremental but important gap.
The paper tackles the problem of whether theoretically provable neural network approximation rates can be realized by practical algorithms like stochastic gradient descent, and it answers this negatively by proving hardness results for approximation and integration on neural network spaces, confirming a theory-to-practice gap in deep learning.
We study the computational complexity of (deterministic or randomized) algorithms based on point samples for approximating or integrating functions that can be well approximated by neural networks. Such algorithms (most prominently stochastic gradient descent and its variants) are used extensively in the field of deep learning. One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates by such algorithms. We answer this question in the negative by proving hardness results for the problems of approximation and integration on a novel class of neural network approximation spaces. In particular, our results confirm a conjectured and empirically observed theory-to-practice gap in deep learning. We complement our hardness results by showing that approximation rates of a comparable order of convergence are (at least theoretically) achievable.