Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes
This work provides incremental improvements in optimization algorithms for machine learning and related fields by automating stepsize tuning and enhancing efficiency in nonconvex and stochastic scenarios.
The authors tackled the problem of minimizing smooth nonconvex functions over convex sets by developing projected gradient methods with auto-conditioned stepsizes, achieving best-known iteration complexities for approximate stationary points without requiring Lipschitz constant inputs or line searches, and extended these to stochastic settings with new complexity bounds.
We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the "vanilla" PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieves the same iteration complexity without requiring the input of the Lipschitz constant of the gradient or any line search procedure. The key idea is to estimate the Lipschitz constant using first-order information gathered from the previous iterations, and to show that the error caused by underestimating the Lipschitz constant can be properly controlled. We then generalize the PG methods to the stochastic setting, by proposing a stochastic projected gradient (SPG) method and a variance-reduced stochastic gradient (VR-SPG) method, achieving new complexity bounds in different oracle settings. We also present auto-conditioned stepsize policies for both stochastic PG methods and establish comparable convergence guarantees.