On non-approximability of zero loss global ${\mathcal L}^2$ minimizers by gradient descent in Deep Learning
This addresses a fundamental limitation in optimization theory for deep learning, highlighting constraints on data requirements for perfect training.
The paper demonstrates that in underparametrized deep learning networks, gradient descent cannot generically achieve zero loss minimization, implying that training data distributions must be non-generic to allow such minimizers.
We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL), and give a detailed discussion of the circumstance that in underparametrized DL networks, zero loss minimization can generically not be attained. As a consequence, we conclude that the distribution of training inputs must necessarily be non-generic in order to produce zero loss minimizers, both for the method constructed in [Chen-Munoz Ewald 2023, 2024], or for gradient descent [Chen 2025] (which assume clustering of training data).