From Averaging to Acceleration, There is Only a Step-size
This work provides a theoretical analysis for optimization algorithms, offering insights into convergence behaviors and proposing improvements, but it is incremental as it builds on existing methods without introducing a new paradigm.
The paper tackles the problem of analyzing and unifying accelerated gradient descent, averaged gradient descent, and the heavy-ball method for non-strongly-convex optimization by reformulating them as constant parameter second-order difference equations, showing that stability leads to convergence at a rate of O(1/n^2) with explicit constants and extending results to noisy gradients to propose an alternative algorithm combining benefits of averaging and acceleration.
We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system , showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.