Continuous-time Models for Stochastic Optimization Algorithms
This work provides theoretical insights for researchers in optimization and machine learning, but it is incremental as it builds on existing continuous-time frameworks.
The authors tackled the problem of analyzing stochastic optimization algorithms by proposing continuous-time models for methods like mini-batch gradient descent and variance-reduced techniques, resulting in derived convergence bounds for non-convex functions and novel insights into SGD dynamics, such as the effect of decreasing learning rates as time warping.
We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.