OCLGMLMay 29, 2017

Gradient Descent Can Take Exponential Time to Escape Saddle Points

arXiv:1705.10412v2258 citations
Originality Highly original
AI Analysis

This addresses the efficiency of optimization algorithms for machine learning practitioners, revealing a fundamental limitation of standard gradient descent.

The paper shows that gradient descent can take exponential time to escape saddle points under natural conditions, whereas perturbed gradient descent achieves polynomial time, highlighting a significant performance gap in non-convex optimization.

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes