LGSTAT-MECHPFMLJun 11, 2019

Power Gradient Descent

arXiv:1906.04787v12 citations
Originality Incremental advance
AI Analysis

This work addresses the need for faster and more stable optimization algorithms in machine learning, though it appears incremental as it modifies existing gradient descent methods rather than introducing a new paradigm.

The authors tackled the problem of slow convergence in flat regions and overshooting in steep directions of gradient descent by introducing a 'power gradient' that adjusts each component's magnitude based on its steepness, resulting in significantly better performance for Nesterov accelerated gradient and AMSGrad, with concrete improvements observed in tests.

The development of machine learning is promoting the search for fast and stable minimization algorithms. To this end, we suggest a change in the current gradient descent methods that should speed up the motion in flat regions and slow it down in steep directions of the function to minimize. It is based on a "power gradient", in which each component of the gradient is replaced by its versus-preserving $H$-th power, with $0<H<1$. We test three modern gradient descent methods fed by such variant and by standard gradients, finding the new version to achieve significantly better performances for the Nesterov accelerated gradient and AMSGrad. We also propose an effective new take on the ADAM algorithm, which includes power gradients with varying $H$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes