Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
This provides incremental improvements for optimization in machine learning by making gradient descent faster and more adaptive without needing global smoothness constants.
The paper tackles the problem of improving gradient descent convergence by using Armijo line-search to adapt step-sizes to local smoothness, showing that it can achieve linear rates for convex logistic regression and multi-class classification, improving over sublinear rates of standard gradient descent.
Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant L and adapts to the ``local'' smoothness, enabling GD to converge faster. Existing theoretical analyses show that GD with Armijo-LS (GD-LS) can result in constant factor improvements over GD with a 1/L step-size (denoted as GD(1/L)). We strengthen these results and show that if the objective function satisfies a certain non-uniform smoothness condition, GD-LS can result in a faster convergence rate than GD(1/L). In particular, we prove that for convex objectives corresponding to logistic regression and multi-class classification, GD-LS can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of GD(1/L). Furthermore, for non-convex objectives satisfying gradient domination (e.g., those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), GD-LS can match the fast convergence of algorithms tailored for these specific settings. Finally, we analyze the convergence of stochastic GD with a stochastic line-search on convex losses under the interpolation assumption.