On uniform-in-time diffusion approximation for stochastic gradient descent
This work addresses a theoretical gap for researchers in optimization and machine learning, providing a more robust framework for analyzing SGD behavior over long time scales, though it is incremental in extending prior approximations.
The paper tackles the limitation of existing diffusion approximations for stochastic gradient descent (SGD) being valid only on finite time intervals, establishing a uniform-in-time approximation under strong convexity of the expected loss and mild conditions, enabling asymptotic analysis via continuous SDEs even without strong convexity of each random loss function.
The diffusion approximation of stochastic gradient descent (SGD) in current literature is only valid on a finite time interval. In this paper, we establish the uniform-in-time diffusion approximation of SGD, by only assuming that the expected loss is strongly convex and some other mild conditions, without assuming the convexity of each random loss function. The main technique is to establish the exponential decay rates of the derivatives of the solution to the backward Kolmogorov equation. The uniform-in-time approximation allows us to study asymptotic behaviors of SGD via the continuous stochastic differential equation (SDE) even when the random objective function $f(\cdot;ξ)$ is not strongly convex.