On Uniform Boundedness Properties of SGD and its Momentum Variants
This provides theoretical guarantees for the stability of SGD and momentum variants in machine learning, addressing a practical issue for researchers and practitioners, though it is incremental as it extends existing boundedness results.
The paper tackles the problem of stochastic gradient descent trajectories potentially escaping to infinity by investigating uniform boundedness properties under smoothness and dissipativity assumptions, showing that common step-size families result in bounded iterates and function values for applications like phase retrieval and neural networks.
A theoretical, and potentially also practical, problem with stochastic gradient descent is that trajectories may escape to infinity. In this note, we investigate uniform boundedness properties of iterates and function values along the trajectories of the stochastic gradient descent algorithm and its important momentum variant. Under smoothness and $R$-dissipativity of the loss function, we show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformly bounded iterates and function values. Several important applications that satisfy these assumptions, including phase retrieval problems, Gaussian mixture models, and some neural network classifiers, are discussed in detail. We further extend the uniform boundedness of SGD and its momentum variant under the generalized dissipativity for the functions whose tails grow slower than quadratic functions. This includes some interesting applications, for example, Bayesian logistic regression and logistic regression with $\ell_1$ regularization.