LGNAOCCOMLSep 10, 2025

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis

arXiv:2509.08483v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work provides theoretical insights into optimization algorithms, specifically heavy-ball momentum, which is incremental but foundational for understanding gradient descent variants.

The paper analyzes gradient descent with Polyak heavy-ball momentum, proving that on an invariant manifold, it behaves like plain gradient descent with a modified loss, with global approximation bounds of O(h^R) for any finite order R. It also conducts a fine-grained combinatorial analysis, revealing hidden polynomial families and deriving continuous modified equations.

We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $β\in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss does not admit a closed-form expression, we describe it with arbitrary precision and prove global (finite "time" horizon) approximation bounds $O(h^{R})$ for any finite order $R \geq 2$. We then conduct a fine-grained analysis of the combinatorics underlying the memoryless approximations of HB, in particular, finding a rich family of polynomials in $β$ hidden inside which contains Eulerian and Narayana polynomials. We derive continuous modified equations of arbitrary approximation order (with rigorous bounds) and the principal flow that approximates the HB dynamics, generalizing Rosca et al. (2023). Approximation theorems cover both full-batch and mini-batch HB. Our theoretical results shed new light on the main features of gradient descent with heavy-ball momentum, and outline a road-map for similar analysis of other optimization algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes