A Dynamical Systems Perspective on Nesterov Acceleration
This provides theoretical insights into optimization algorithms for machine learning researchers, but it is incremental as it builds on existing dynamical system perspectives.
The paper tackled the problem of understanding Nesterov's accelerated gradient method by deriving it from a dynamical system framework without relying on vanishing step sizes, showing that it arises from discretizing an ODE with a semi-implicit Euler scheme and identifying a curvature-dependent damping term as key to acceleration.
We present a dynamical system framework for understanding Nesterov's accelerated gradient method. In contrast to earlier work, our derivation does not rely on a vanishing step size argument. We show that Nesterov acceleration arises from discretizing an ordinary differential equation with a semi-implicit Euler integration scheme. We analyze both the underlying differential equation as well as the discretization to obtain insights into the phenomenon of acceleration. The analysis suggests that a curvature-dependent damping term lies at the heart of the phenomenon. We further establish connections between the discretized and the continuous-time dynamics.