LGOCNov 7, 2021

Exponential escape efficiency of SGD from sharp minima in non-stationary regime

arXiv:2111.04004v25 citations
AI Analysis

This addresses a foundational open question in machine learning about SGD's generalization properties, with incremental theoretical advancement.

The paper tackles the problem of understanding why stochastic gradient descent (SGD) finds generalizable parameters by proving that SGD escapes from sharp minima exponentially fast even before reaching a stationary distribution, using Large Deviation Theory and experiments to validate the theory.

We show that stochastic gradient descent (SGD) escapes from sharp minima exponentially fast even before SGD reaches stationary distribution. SGD has been a de-facto standard training algorithm for various machine learning tasks. However, there still exists an open question as to why SGDs find highly generalizable parameters from non-convex target functions, such as the loss function of neural networks. An "escape efficiency" has been an attractive notion to tackle this question, which measures how SGD efficiently escapes from sharp minima with potentially low generalization performance. Despite its importance, the notion has the limitation that it works only when SGD reaches a stationary distribution after sufficient updates. In this paper, we develop a new theory to investigate escape efficiency of SGD with Gaussian noise, by introducing the Large Deviation Theory for dynamical systems. Based on the theory, we prove that the fast escape form sharp minima, named exponential escape, occurs in a non-stationary setting, and that it holds not only for continuous SGD but also for discrete SGD. A key notion for the result is a quantity called "steepness," which describes the SGD's stochastic behavior throughout its training process. Our experiments are consistent with our theory.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes