LGOCMay 26

Convergence of Spectral Descent for Non-smooth Optimization

arXiv:2605.2697719.2
Predicted impact top 25% in LG · last 90 daysOriginality Incremental advance
AI Analysis

It provides the first theoretical convergence guarantees for Muon-type optimizers in non-smooth settings, addressing a gap in understanding for practitioners training large models.

The paper establishes global linear convergence guarantees for Spectral Descent and its truncated variant in non-smooth convex optimization under convexity, Lipschitz continuity, and sharpness conditions, and provides recovery guarantees for robust low-rank matrix recovery.

The Muon optimizer has recently demonstrated remarkable empirical success in training large language models. However, the theoretical understanding of its mechanisms remains limited. Current convergence guarantees for Muon rely heavily on smoothness assumptions, leaving its non-smooth convergence behavior largely unexplored. In this work, we take a step toward bridging this gap by investigating Spectral Descent (SD), a simplified variant of Muon, together with its truncated counterpart, Truncated Spectral Descent (TSD). Under convexity, Lipschitz continuity, and sharpness conditions, we establish global linear convergence for both SD and TSD in non-smooth convex formulations. We also study regularized variants equipped with decoupled weight decay and derive sublinear convergence guarantees through their connection with Frank-Wolfe methods. Finally, we apply our theoretical framework to robust low-rank matrix recovery under mixed sparse and dense noise regimes and provide rigorous recovery guarantees. Numerical experiments support the theoretical findings and demonstrate the effectiveness of Muon-type methods for non-smooth optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes