OC LGFeb 6

RanSOM: Second-Order Momentum with Randomized Scaling for Constrained and Unconstrained Optimization

arXiv:2602.06824v14.62 citationsh-index: 6

Originality Highly original

AI Analysis

This work addresses a fundamental bottleneck in optimization for machine learning practitioners, offering a novel method to improve training efficiency without expensive auxiliary sampling.

The paper tackled the problem of curvature-induced bias in momentum methods for deep network training, which limits convergence to suboptimal rates, by proposing RanSOM, a framework that uses randomized step sizes to eliminate this bias and achieve optimal convergence rates under various noise conditions.

Momentum methods, such as Polyak's Heavy Ball, are the standard for training deep networks but suffer from curvature-induced bias in stochastic settings, limiting convergence to suboptimal $\mathcal{O}(ε^{-4})$ rates. Existing corrections typically require expensive auxiliary sampling or restrictive smoothness assumptions. We propose \textbf{RanSOM}, a unified framework that eliminates this bias by replacing deterministic step sizes with randomized steps drawn from distributions with mean $η_t$. This modification allows us to leverage Stein-type identities to compute an exact, unbiased estimate of the momentum bias using a single Hessian-vector product computed jointly with the gradient, avoiding auxiliary queries. We instantiate this framework in two algorithms: \textbf{RanSOM-E} for unconstrained optimization (using exponentially distributed steps) and \textbf{RanSOM-B} for constrained optimization (using beta-distributed steps to strictly preserve feasibility). Theoretical analysis confirms that RanSOM recovers the optimal $\mathcal{O}(ε^{-3})$ convergence rate under standard bounded noise, and achieves optimal rates for heavy-tailed noise settings ($p \in (1, 2]$) without requiring gradient clipping.

View on arXiv PDF

Similar