LG STAT-MECH PF MLJun 11, 2019

Power Gradient Descent

arXiv:1906.04787v11.82 citations

Originality Incremental advance

AI Analysis

This work addresses the need for faster and more stable optimization algorithms in machine learning, though it appears incremental as it modifies existing gradient descent methods rather than introducing a new paradigm.

The authors tackled the problem of slow convergence in flat regions and overshooting in steep directions of gradient descent by introducing a 'power gradient' that adjusts each component's magnitude based on its steepness, resulting in significantly better performance for Nesterov accelerated gradient and AMSGrad, with concrete improvements observed in tests.

The development of machine learning is promoting the search for fast and stable minimization algorithms. To this end, we suggest a change in the current gradient descent methods that should speed up the motion in flat regions and slow it down in steep directions of the function to minimize. It is based on a "power gradient", in which each component of the gradient is replaced by its versus-preserving $H$-th power, with $0<H<1$. We test three modern gradient descent methods fed by such variant and by standard gradients, finding the new version to achieve significantly better performances for the Nesterov accelerated gradient and AMSGrad. We also propose an effective new take on the ADAM algorithm, which includes power gradients with varying $H$.

View on arXiv PDF

Similar