LGAIMay 10

Intrinsic Muon: Spectral Optimization on Riemannian Matrix Manifolds

arXiv:2605.0923882.91 citations
Predicted impact top 13% in LG · last 90 daysOriginality Highly original
AI Analysis

Provides a principled framework for norm-constrained optimization on matrix manifolds, removing prior limitations and enabling efficient training of manifold-constrained parameters in large-scale learning.

Muon optimizers are extended to Riemannian manifolds (fixed-rank, SPD, Stiefel, Grassmann) via intrinsic norm-constrained linear maximization oracles, yielding closed-form updates with convergence guarantees that depend only on manifold dimension (rank for fixed-rank). Experiments on LoRA finetuning, image classification, and subspace learning show efficacy.

Muon and related norm-constrained matrix optimizers have become central to large-scale learning problems. They are formulated as a linear maximization oracle (LMO) over an ambient matrix-norm ball in unconstrained Euclidean space. However, these do not generalize cleanly to manifold-valued parameters such as low-rank factorizations, orthogonality constraints, or symmetric positive definite (SPD) matrices. Naively restricting the Muon LMO to the tangent space (i) breaks quotient symmetries and (ii) couples the tangent-space constraint with an ambient norm bound, thereby obstructing closed-form solutions on various manifolds of interest. We resolve both issues with a single observation: every Riemannian metric canonically lifts a unitarily invariant Euclidean norm to an intrinsic norm on each tangent space, and the resulting intrinsic norm constrained LMO is symmetry preserving. Building on this, we introduce intrinsic Muon (iMuon), a unified framework that yields closed-form updates on the fixed-rank, SPD, Stiefel, and Grassmann manifolds for any unitarily invariant norm, including the spectral, Frobenius, and nuclear norms. We establish convergence guarantees for both deterministic and stochastic iMuon with rate constants that depend only on the manifold dimension. Notably, on the fixed-rank manifold this constant depends only on the rank, making the rate independent of factor conditioning and removing the runtime factor-rescaling required by prior work. Experiments on LoRA finetuning of LLMs, image classification, and subspace learning illustrate the efficacy of the proposed approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes