LGOCMLFeb 27, 2020

On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings

arXiv:2002.12414v271 citations
AI Analysis

This addresses convergence issues in stochastic optimization for machine learning practitioners, providing insights into when acceleration methods fail, though it is incremental as it builds on existing theory.

The paper analyzes Nesterov's accelerated gradient method in stochastic settings, showing it converges to a neighborhood of the optimum at an accelerated rate in stochastic approximation but may diverge in finite-sum settings without additional conditions on problem conditioning and data coherence.

We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov's method in stochastic settings, we focus throughout on objectives that are smooth, strongly-convex, and twice continuously differentiable. In the stochastic approximation setting, Nesterov's method converges to a neighborhood of the optimal point at the same accelerated rate as in the deterministic setting. Perhaps surprisingly, in the finite-sum setting, we prove that Nesterov's method may diverge with the usual choice of step-size and momentum, unless additional conditions on the problem related to conditioning and data coherence are satisfied. Our results shed light as to why Nesterov's method may fail to converge or achieve acceleration in the finite-sum setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes