Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
This work addresses a fundamental open problem in optimization for machine learning practitioners, providing theoretical insights into SGD behavior, but it is incremental as it builds on existing concepts in ergodic theory.
The authors tackled the problem of understanding stochastic gradient descent (SGD) dynamics around saddle points in neural networks, identifying Type-II saddles as particularly hard to escape due to vanishing gradient noise, and they showed that SGD dynamics can be classified into four phases based on the signal-to-noise ratio.
Characterizing and understanding the dynamics of stochastic gradient descent (SGD) around saddle points remains an open problem. We first show that saddle points in neural networks can be divided into two types, among which the Type-II saddles are especially difficult to escape from because the gradient noise vanishes at the saddle. The dynamics of SGD around these saddles are thus to leading order described by a random matrix product process, and it is thus natural to study the dynamics of SGD around these saddles using the notion of probabilistic stability and the related Lyapunov exponent. Theoretically, we link the study of SGD dynamics to well-known concepts in ergodic theory, which we leverage to show that saddle points can be either attractive or repulsive for SGD, and its dynamics can be classified into four different phases, depending on the signal-to-noise ratio in the gradient close to the saddle.