OCLGMar 29, 2022

A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory

arXiv:2203.17226v16 citations
Originality Incremental advance
AI Analysis

This work provides a foundational insight for researchers in optimization and machine learning by demystifying a key algorithm, though it is incremental as it builds on existing optimal control theory.

The paper tackles the problem of understanding Nesterov's accelerated gradient algorithm by deriving it from first principles using optimal control theory, resulting in a clear theoretical explanation that resolves the algorithm's perceived mystery.

Nesterov's accelerated gradient algorithm is derived from first principles. The first principles are founded on the recently-developed optimal control theory for optimization. This theory frames an optimization problem as an optimal control problem whose trajectories generate various continuous-time algorithms. The algorithmic trajectories satisfy the necessary conditions for optimal control. The necessary conditions produce a controllable dynamical system for accelerated optimization. Stabilizing this system via a quadratic control Lyapunov function generates an ordinary differential equation. An Euler discretization of the resulting differential equation produces Nesterov's algorithm. In this context, this result solves the purported mystery surrounding the algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes