LGOCMLNov 15, 2016

The Power of Normalization: Faster Evasion of Saddle Points

arXiv:1611.04831v1115 citations
Originality Incremental advance
AI Analysis

This addresses a key challenge in non-convex optimization for machine learning practitioners, offering a faster method to avoid saddle points, though it is incremental as it builds on existing NGD heuristics.

The paper tackles the problem of saddle point evasion in non-convex optimization by analyzing Normalized Gradient Descent (NGD) with noise injection, showing it provably converges to local minima and improves upon the fastest known first-order algorithm rates.

A commonly used heuristic in non-convex optimization is Normalized Gradient Descent (NGD) - a variant of gradient descent in which only the direction of the gradient is taken into account and its magnitude ignored. We analyze this heuristic and show that with carefully chosen parameters and noise injection, this method can provably evade saddle points. We establish the convergence of NGD to a local minimum, and demonstrate rates which improve upon the fastest known first order algorithm due to Ge e al. (2015). The effectiveness of our method is demonstrated via an application to the problem of online tensor decomposition; a task for which saddle point evasion is known to result in convergence to global minima.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes