LGOCFeb 15, 2023

Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the Bounded Gradient Assumption

arXiv:2302.07862v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This provides theoretical guarantees for avoiding saddle points in optimization, which is crucial for training neural networks, though it is incremental by extending existing results to more methods and assumptions.

The paper proves that stochastic gradient descent methods, including SGD, SHB, and SNAG, almost surely avoid strict saddle manifolds, with the analysis removing the need for bounded gradients and uniformly bounded noise, introducing a more practical local boundedness assumption applicable to neural network training.

We prove that various stochastic gradient descent methods, including the stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold. To the best of our knowledge, this is the first time such results are obtained for SHB and SNAG methods. Moreover, our analysis expands upon previous studies on SGD by removing the need for bounded gradients of the objective function and uniformly bounded noise. Instead, we introduce a more practical local boundedness assumption for the noisy gradient, which is naturally satisfied in empirical risk minimization problems typically seen in training of neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes