LGMar 31, 2021

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

arXiv:2103.17182v516.841 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing generalization in deep learning by manipulating gradient noise, offering a novel optimizer variant that could benefit practitioners in training neural networks, though it appears incremental as it builds on existing momentum-based methods.

The paper tackles the problem of simulating stochastic gradient noise (SGN) to improve generalization in deep learning, proposing Positive-Negative Momentum (PNM) as a low-cost alternative to conventional momentum methods, with theoretical convergence guarantees and empirical verification showing significant advantages over standard optimizers like SGD with Momentum and Adam.

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random noise cannot work as well as SGN, which is anisotropic and parameter-dependent. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach that is a powerful alternative to conventional Momentum in classic optimizers. The introduced PNM method maintains two approximate independent momentum terms. Then, we can control the magnitude of SGN explicitly by adjusting the momentum difference. We theoretically prove the convergence guarantee and the generalization advantage of PNM over Stochastic Gradient Descent (SGD). By incorporating PNM into the two conventional optimizers, SGD with Momentum and Adam, our extensive experiments empirically verified the significant advantage of the PNM-based variants over the corresponding conventional Momentum-based optimizers.

View on arXiv PDF Code

Similar