LG OCMar 6, 2024

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

Princeton

arXiv:2403.04081v214.218 citationsh-index: 14NIPS

Originality Incremental advance

AI Analysis

This work provides incremental improvements in optimization theory for machine learning practitioners by offering more adaptive and theoretically justified step-sizes for gradient methods.

The paper tackled the problem of improving convergence guarantees for gradient descent by introducing directional smoothness, which led to tighter sub-optimality bounds that depend on path-dependent conditioning rather than worst-case constants, with experiments on logistic regression showing these guarantees outperform classical L-smoothness theory.

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on $L$-smoothness.

View on arXiv PDF

Similar