OCLGMLMar 14, 2016

On the Influence of Momentum Acceleration on Online Learning

arXiv:1603.04136v464 citations
AI Analysis

This work addresses the impact of momentum acceleration on online learning algorithms, revealing limitations in noisy environments, which is incremental as it clarifies existing methods rather than introducing new ones.

The paper analyzes momentum stochastic gradient methods in online learning, finding that they are equivalent to standard stochastic gradient methods with a rescaled step-size, and that momentum's benefits for deterministic optimization do not necessarily apply in adaptive online settings with constant step-sizes.

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes