A Rod Flow Model for Adam at the Edge of Stability

arXiv:2605.0682117.21 citations

Predicted impact top 24% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This provides a more accurate continuous-time model for understanding the dynamics of adaptive optimizers like Adam at the edge of stability, which is important for researchers studying optimization in deep learning.

The authors extend the rod flow model, previously developed for gradient descent, to adaptive gradient methods including Adam, enabling accurate tracking of discrete iterates through the edge-of-stability regime. Empirical evaluations on representative architectures show rod flow significantly outperforms stable flow for all eight optimizers tested.

Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to momentum methods remains underdeveloped. In the gradient descent setting, Regis et al. (arXiv:2602.01480) introduced rod flow, which models consecutive iterates as an extended one-dimensional object -- a "rod." Here we extend rod flow to Adam by working in the joint phase space of parameters and first moment $(w, m)$ and treating the second moment $ν$ as a smooth auxiliary variable. We also develop rod flows for heavy ball momentum, Nesterov momentum, and scalar and per-component versions of RMSProp, Adam, and NAdam. For all eight optimizers, we empirically evaluate rod flow on representative machine learning architectures, where it tracks the discrete iterates through the edge-of-stability regime significantly more accurately than the corresponding stable flow.

View on arXiv PDF

Similar