LGOCDATA-ANMLMar 15, 2020

Stochastic gradient descent with random learning rate

arXiv:2003.06926v46 citations
AI Analysis

This work addresses optimization efficiency for neural network training, but it is incremental as it builds on existing stochastic gradient descent methods with a novel twist.

The paper tackles the problem of optimizing neural networks by introducing a uniformly-distributed random learning rate in stochastic gradient descent, showing that this strategy yields better regularization without extra computational cost in the small learning rate regime, as evidenced by experiments on MNIST and CIFAR10 datasets.

We propose to optimize neural networks with a uniformly-distributed random learning rate. The associated stochastic gradient descent algorithm can be approximated by continuous stochastic equations and analyzed within the Fokker-Planck formalism. In the small learning rate regime, the training process is characterized by an effective temperature which depends on the average learning rate, the mini-batch size and the momentum of the optimization algorithm. By comparing the random learning rate protocol with cyclic and constant protocols, we suggest that the random choice is generically the best strategy in the small learning rate regime, yielding better regularization without extra computational cost. We provide supporting evidence through experiments on both shallow, fully-connected and deep, convolutional neural networks for image classification on the MNIST and CIFAR10 datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes