LGOCMLApr 18, 2023

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

arXiv:2304.09221v26 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This work addresses the theoretical understanding of optimization in deep learning, but it is incremental as it builds on existing conditions and assumptions.

The paper tackles the problem of proving convergence for stochastic gradient descent (SGD) in non-convex settings, specifically for deep neural networks, by establishing local convergence with positive probability under a local Lojasiewicz condition and additional structural assumptions, and provides examples where these assumptions hold for finite-width networks.

We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local Łojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes