LGJul 15, 2025

AdaMuon: Adaptive Muon Optimizer

arXiv:2507.11005v239 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the need for more efficient optimizers in deep learning, particularly for large-scale applications, though it appears incremental as it builds upon existing methods like Adam.

The paper tackled the problem of improving training efficiency for large-scale neural networks by proposing AdaMuon, an optimizer that combines element-wise adaptivity with orthogonal updates, resulting in over 40% higher training efficiency compared to Adam in experiments.

We propose AdaMuon, a novel optimizer that combines element-wise adaptivity with orthogonal updates for large-scale neural network training. AdaMuon incorporates two tightly coupled mechanisms: (1) an element-wise second momentum estimator applied to orthogonalized update directions, and (2) a sign-stabilized orthogonal update, where the momentum is first sign-transformed before orthogonalization. These two components jointly enable variance-adaptive scaling while maintaining stable update geometry. In addition, AdaMuon employs an RMS-aligned rescaling strategy to match the root-mean-square update magnitude to Adam, allowing direct reuse of existing learning rate schedules without extra tuning. Experiments demonstrate that AdaMuon not only maintains stability but can surpass Adam by more than 40% training efficiency in large-scale scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes