Convergence of continuous-time stochastic gradient descent with applications to deep neural networks
This work addresses the theoretical understanding of SGD convergence for researchers in optimization and machine learning, but it is incremental as it builds directly on existing results.
The paper tackles the problem of establishing convergence guarantees for continuous-time stochastic gradient descent (SGD) in learning problems, extending prior results from non-stochastic gradient descent to provide general sufficient conditions for convergence, with applications to overparametrized neural network training.
We study a continuous-time approximation of the stochastic gradient descent process for minimizing the population expected loss in learning problems. The main results establish general sufficient conditions for the convergence, extending the results of Chatterjee (2022) established for (nonstochastic) gradient descent. We show how the main result can be applied to the case of overparametrized neural network training.