LG MLJul 7, 2019

Stochastic Gradient and Langevin Processes

Xiang Cheng, Dong Yin, Peter L. Bartlett, Michael I. Jordan

arXiv:1907.03215v720.345 citations

Originality Incremental advance

AI Analysis

This provides theoretical insights into SGD convergence for non-convex optimization, which is incremental but relevant for machine learning practitioners dealing with deep learning training.

The paper tackles the problem of quantifying convergence rates for discrete Langevin-like processes to invariant distributions under non-Gaussian, state-dependent noise and non-convex potentials, showing that these rates depend on the potential and noise second moment, with experimental validation on CIFAR-10 using SGD for deep neural networks.

We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second moment of the additive noise. We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset.

View on arXiv PDF

Similar