OC LGMar 29, 2022

A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory

arXiv:2203.17226v16 citations

Originality Incremental advance

AI Analysis

This work provides a foundational insight for researchers in optimization and machine learning by demystifying a key algorithm, though it is incremental as it builds on existing optimal control theory.

The paper tackles the problem of understanding Nesterov's accelerated gradient algorithm by deriving it from first principles using optimal control theory, resulting in a clear theoretical explanation that resolves the algorithm's perceived mystery.

Nesterov's accelerated gradient algorithm is derived from first principles. The first principles are founded on the recently-developed optimal control theory for optimization. This theory frames an optimization problem as an optimal control problem whose trajectories generate various continuous-time algorithms. The algorithmic trajectories satisfy the necessary conditions for optimal control. The necessary conditions produce a controllable dynamical system for accelerated optimization. Stabilizing this system via a quadratic control Lyapunov function generates an ordinary differential equation. An Euler discretization of the resulting differential equation produces Nesterov's algorithm. In this context, this result solves the purported mystery surrounding the algorithm.

View on arXiv PDF

Similar