LG OC MLApr 18, 2023

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

arXiv:2304.09221v27.76 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This work addresses the theoretical understanding of optimization in deep learning, but it is incremental as it builds on existing conditions and assumptions.

The paper tackles the problem of proving convergence for stochastic gradient descent (SGD) in non-convex settings, specifically for deep neural networks, by establishing local convergence with positive probability under a local Lojasiewicz condition and additional structural assumptions, and provides examples where these assumptions hold for finite-width networks.

We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local Łojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.

View on arXiv PDF

Similar