LGDec 10, 2025

Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation

arXiv:2512.10033v1
Originality Incremental advance
AI Analysis

This provides a robust alternative for optimization in machine learning, particularly for practitioners dealing with challenging landscapes, though it is incremental as it builds on existing momentum methods.

The paper tackles the problem of accelerated gradient methods diverging on ill-conditioned or non-convex landscapes by proposing HB-SGE, which combines heavy-ball momentum with predictive gradient extrapolation, resulting in convergence in 119 iterations on ill-conditioned quadratics where SGD and NAG diverge, and 2,718 iterations on the non-convex Rosenbrock function where classical momentum methods diverge within 10 steps.

Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes due to aggressive momentum accumulation. We propose Heavy-Ball Synthetic Gradient Extrapolation (HB-SGE), a robust first-order method that combines heavy-ball momentum with predictive gradient extrapolation. Unlike classical momentum methods that accumulate historical gradients, HB-SGE estimates future gradient directions using local Taylor approximations, providing adaptive acceleration while maintaining stability. We prove convergence guarantees for strongly convex functions and demonstrate empirically that HB-SGE prevents divergence on problems where NAG and standard momentum fail. On ill-conditioned quadratics (condition number $κ=50$), HB-SGE converges in 119 iterations while both SGD and NAG diverge. On the non-convex Rosenbrock function, HB-SGE achieves convergence in 2,718 iterations where classical momentum methods diverge within 10 steps. While NAG remains faster on well-conditioned problems, HB-SGE provides a robust alternative with speedup over SGD across diverse landscapes, requiring only $O(d)$ memory overhead and the same hyperparameters as standard momentum.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes