Quantitative Weak Convergence for Discrete Stochastic Processes
This provides a theoretical foundation for understanding convergence in gradient-based optimization algorithms, which is incremental but important for machine learning practitioners.
The paper tackles the problem of quantifying convergence rates for Langevin-like stochastic processes, including stochastic gradient descent, by proving that iterates converge to an invariant distribution at a rate of $ ilde{O}(1/\sqrt{k})$, which is tight up to log factors, with a reduction to a quantitative Central Limit Theorem for quadratic potentials.
In this paper, we quantitative convergence in $W_2$ for a family of Langevin-like stochastic processes that includes stochastic gradient descent and related gradient-based algorithms. Under certain regularity assumptions, we show that the iterates of these stochastic processes converge to an invariant distribution at a rate of $\tilde{O}\lrp{1/\sqrt{k}}$ where $k$ is the number of steps; this rate is provably tight up to log factors. Our result reduces to a quantitative form of the classical Central Limit Theorem in the special case when the potential is quadratic.