Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent
This provides a new framework for optimization algorithms, potentially improving efficiency in machine learning training, though it appears incremental as it builds on existing methods.
The paper tackles the problem of unifying and interpreting gradient-based optimization methods, showing that a new algorithm called Regularised Gradient Descent converges more quickly than classical momentum and Nesterov's accelerated gradient.
We present a unifying framework for adapting the update direction in gradient-based iterative optimization methods. As natural special cases we re-derive classical momentum and Nesterov's accelerated gradient method, lending a new intuitive interpretation to the latter algorithm. We show that a new algorithm, which we term Regularised Gradient Descent, can converge more quickly than either Nesterov's algorithm or the classical momentum algorithm.