OCLGMLAug 19, 2019

Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization

arXiv:1908.07023v128 citations
AI Analysis

This provides theoretical guarantees for optimization algorithms in machine learning, though it appears incremental as it refines existing noise assumptions.

The paper tackles the problem of ensuring efficient escape from saddle-points in non-convex optimization for stochastic gradient descent, showing that a relaxed relative bound on gradient noise variance suffices without extra noise or alternating step-sizes.

Recent years have seen increased interest in performance guarantees of gradient descent algorithms for non-convex optimization. A number of works have uncovered that gradient noise plays a critical role in the ability of gradient descent recursions to efficiently escape saddle-points and reach second-order stationary points. Most available works limit the gradient noise component to be bounded with probability one or sub-Gaussian and leverage concentration inequalities to arrive at high-probability results. We present an alternate approach, relying primarily on mean-square arguments and show that a more relaxed relative bound on the gradient noise variance is sufficient to ensure efficient escape from saddle-points without the need to inject additional noise, employ alternating step-sizes or rely on a global dispersive noise assumption, as long as a gradient noise component is present in a descent direction for every saddle-point.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes