LG OC MLMay 23, 2018

Predictive Local Smoothness for Stochastic Gradient Methods

Jun Li, Hongfu Liu, Bineng Zhong, Yue Wu, Yun Fu

arXiv:1805.09386v12.93 citations

Originality Incremental advance

AI Analysis

This work addresses convergence issues in nonconvex optimization for deep learning, offering incremental improvements to existing stochastic gradient methods.

The authors tackled the problem of low asymptotic convergence in stochastic gradient methods due to fixed smoothness by proposing predictive local smoothness (PLS), which adapts learning rates based on local smoothness predicted from gradients, resulting in faster convergence and alleviation of gradient explosion and vanishing in variants like PLS-SGD, PLS-AccSGD, and PLS-AMSGrad.

Stochastic gradient methods are dominant in nonconvex optimization especially for deep models but have low asymptotical convergence due to the fixed smoothness. To address this problem, we propose a simple yet effective method for improving stochastic gradient methods named predictive local smoothness (PLS). First, we create a convergence condition to build a learning rate which varies adaptively with local smoothness. Second, the local smoothness can be predicted by the latest gradients. Third, we use the adaptive learning rate to update the stochastic gradients for exploring linear convergence rates. By applying the PLS method, we implement new variants of three popular algorithms: PLS-stochastic gradient descent (PLS-SGD), PLS-accelerated SGD (PLS-AccSGD), and PLS-AMSGrad. Moreover, we provide much simpler proofs to ensure their linear convergence. Empirical results show that the variants have better performance gains than the popular algorithms, such as, faster convergence and alleviating explosion and vanish of gradients.

View on arXiv PDF

Similar