AEGD: Adaptive Gradient Descent with Energy
This work addresses optimization challenges in machine learning, particularly for non-convex problems like deep neural networks, by introducing a robust and tunable method, though it appears incremental as it builds on existing gradient descent frameworks.
The authors tackled the problem of optimizing non-convex objective functions by proposing AEGD, a new gradient-based algorithm that dynamically updates an energy variable, resulting in unconditional energy stability and proven convergence rates for both non-convex and convex settings, with experimental results showing comparable or better generalization performance than SGD with momentum for deep neural networks.
We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the step size. We prove energy-dependent convergence rates of AEGD for both non-convex and convex objectives, which for a suitably small step size recovers desired convergence rates for the batch gradient descent. We also provide an energy-dependent bound on the stationary convergence of AEGD in the stochastic non-convex setting. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for a large variety of optimization problems: it is robust with respect to initial data, capable of making rapid initial progress. The stochastic AEGD shows comparable and often better generalization performance than SGD with momentum for deep neural networks.