ML LG OCOct 20, 2017

First-order Methods Almost Always Avoid Saddle Points

Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, Benjamin Recht

arXiv:1710.07406v126.8106 citations

Originality Incremental advance

AI Analysis

This addresses the problem of optimization convergence in machine learning by showing that common algorithms inherently avoid saddle points, which is incremental but broad in scope.

The paper proves that first-order methods, such as gradient descent, avoid saddle points for almost all initializations, without needing second-order derivatives or randomness beyond initialization.

We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.

View on arXiv PDF

Similar