Stochastic Nonconvex Optimization with Large Minibatches
This addresses the challenge of efficient training for deep learning models, offering incremental improvements in optimization speed and scalability.
The paper tackles the problem of stochastic optimization for nonconvex loss functions, such as those in neural network training, by proposing algorithms that use large minibatches and achieve faster convergence rates than minibatch stochastic gradient descent, with improved parallelization.
We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objective with faster rates than minibatch stochastic gradient descent, and facilitate better parallelization by allowing larger minibatches.