MLLGSTDec 14, 2021

Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent

arXiv:2112.07110v9
Originality Incremental advance
AI Analysis

This work provides theoretical insights into SGD regularization and escape mechanisms, but it is incremental as it builds on prior research on M-SGD and noise universality.

The paper tackles the problem of analyzing the noise properties in stochastic gradient descent (SGD) by proving universality results for noise classes with similar mean and covariance structures, showing that the error in Multiplicative Stochastic Gradient Descent (M-SGD) is approximately a scaled Gaussian distribution with mean 0 at fixed points.

Past research has indicated that the covariance of the Stochastic Gradient Descent (SGD) error done via minibatching plays a critical role in determining its regularization and escape from low potential points. Motivated by some new research in this area, we prove universality results by showing that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties. We mainly consider the Multiplicative Stochastic Gradient Descent (M-SGD) algorithm as introduced in previous work, which has a much more general noise class than the SGD algorithm done via minibatching. We establish non asymptotic bounds for the M-SGD algorithm in the Wasserstein distance. We also show that the M-SGD error is approximately a scaled Gaussian distribution with mean $0$ at any fixed point of the M-SGD algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes