OCLGOct 5, 2018

Continuous-time Models for Stochastic Optimization Algorithms

arXiv:1810.02565v340 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights for researchers in optimization and machine learning, but it is incremental as it builds on existing continuous-time frameworks.

The authors tackled the problem of analyzing stochastic optimization algorithms by proposing continuous-time models for methods like mini-batch gradient descent and variance-reduced techniques, resulting in derived convergence bounds for non-convex functions and novel insights into SGD dynamics, such as the effect of decreasing learning rates as time warping.

We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes