LG MLSep 29, 2025

AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates

arXiv:2509.24320v21 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the computational bottleneck in optimization for machine learning practitioners, offering a more efficient alternative to existing methods, though it appears incremental as it builds on prior momentum-based approaches.

The authors tackled the computational inefficiency of semi-orthogonal momentum updates in optimization by proposing AuON, a linear-time method that avoids explicit semi-orthogonalization while maintaining performance, achieving results comparable to AdamW and Muon on vision and language benchmarks.

Orthogonal gradient updates have emerged as a promising direction in optimization for machine learning. However, traditional approaches such as SVD/QR decomposition incur prohibitive computational costs of O(n^3) and underperform compared to well-tuned SGD with momentum, since momentum is applied only after strict orthogonalization. Recent advances, such as Muon, improve efficiency by applying momentum before orthogonalization and producing semi-orthogonal matrices via Newton-Schulz iterations, reducing complexity to O(n^2). Nevertheless, quadratic costs remain a bottleneck. In this work, we study the semi-orthogonal properties of momentum-based updates and develop a method to bound momentum updates under a spectral-norm trust region, preserving directional information without requiring explicit semi-orthogonalization. We propose AuON (Alternative Unit-norm momentum updates by Normalized nonlinear scaling), a linear-time optimizer that achieves strong performance without constructing semi-orthogonal matrices, while preserving structural alignment and reconditioning ill-posed updates. Our approach combines hyperbolic-cosine RMS scaling transformations with normalization, demonstrating both effectiveness and computational efficiency compared to Newton-Schulz methods. We further introduce a hybrid variant (Hybrid-AuON) that applies a single Newton-Schulz iteration. Experiments across vision and language benchmarks show that AuON and its hybrid variant achieve performance comparable to strong baselines such as AdamW and Muon. Code is available at: https://github.com/ryyzn9/AuON

View on arXiv PDF Code

Similar