Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization
This work addresses the challenge of optimizing nonconvex functions in machine learning, offering theoretical improvements that are incremental but enhance practical applicability.
The paper tackled the problem of finding global optimal solutions for nonconvex functions by analyzing how stochastic gradient descent (SGD) with light- or heavy-tailed noise smooths the objective, and it proposed a new graduated optimization algorithm that varies smoothing by adjusting learning rate and batch size, achieving convergence guarantees for a broader class of nonconvex functions by relaxing the restrictive $\sigma$-nice property.
The graduated optimization approach is a method for finding global optimal solutions for nonconvex functions by using a function smoothing operation with stochastic noise. This paper makes three contributions regarding graduated optimization. First, we extend the definition of function smoothing that is traditionally achieved through convolution with Gaussian noise and characterize for the first time function smoothing with heavy-tailed noise. Second, we show that light- or heavy-tailed stochastic noise in stochastic gradient descent (SGD) has the effect of smoothing the objective function, the degree of which is determined by the learning rate, batch size, and the moment of the stochastic noise. Using this finding, we propose and analyze a new graduated optimization algorithm that varies the degree of smoothing by varying the learning rate and batch size. Third, we relax the $σ$-nice property, a standard but restrictive condition in the analysis of graduated optimization. Our refinement enables convergence guarantees for a broader class of non-convex functions, thereby bridging the gap between theoretical assumptions and practical optimization landscapes.