LGAIMATH-PHOCMLNov 13, 2023

On non-approximability of zero loss global ${\mathcal L}^2$ minimizers by gradient descent in Deep Learning

arXiv:2311.07065v3h-index: 3
Originality Incremental advance
AI Analysis

This addresses a fundamental limitation in optimization theory for deep learning, highlighting constraints on data requirements for perfect training.

The paper demonstrates that in underparametrized deep learning networks, gradient descent cannot generically achieve zero loss minimization, implying that training data distributions must be non-generic to allow such minimizers.

We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL), and give a detailed discussion of the circumstance that in underparametrized DL networks, zero loss minimization can generically not be attained. As a consequence, we conclude that the distribution of training inputs must necessarily be non-generic in order to produce zero loss minimizers, both for the method constructed in [Chen-Munoz Ewald 2023, 2024], or for gradient descent [Chen 2025] (which assume clustering of training data).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes