OC DSMay 29

Acceleration by Random Stepsizes: Hedging, Equalization, and the Arcsine Stepsize Schedule

arXiv:2412.0579044.99 citationsh-index: 62

Predicted impact top 16% in OC · last 90 daysOriginality Highly original

AI Analysis

Provides a simple, theory-driven method to accelerate gradient descent for convex optimization, potentially impacting optimization in machine learning.

Random stepsizes from the Arcsine distribution accelerate Gradient Descent for separable convex optimization, improving the convergence rate from O(k) to O(√k) without momentum or other modifications.

We show that for separable convex optimization, random stepsizes fully accelerate Gradient Descent. Specifically, using inverse stepsizes i.i.d. from the Arcsine distribution improves the convergence rate from $O(k)$ to $O(\sqrt{k})$, where $k$ is the condition number. No momentum or other algorithmic modifications are required. Our starting point is a remarkable "equalization property" of the Arcsine distribution: it yields an identical convergence rate for all quadratic functions. A key technical insight is that martingale arguments extend this phenomenon to all separable convex functions. We interpret this equalization as an extreme form of hedging: by using this random distribution over stepsizes, Gradient Descent converges at exactly the same rate for all functions in the function class.

View on arXiv PDF

Similar