LGOCSep 15, 2025

Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training

arXiv:2509.11983v113 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient foundation model training for AI practitioners by introducing an incremental improvement to existing optimization methods.

The paper tackles the problem of large-scale matrix optimization in neural network training by proposing low-rank orthogonalization, which leverages the low-rank nature of gradients, and introduces low-rank variants of matrix-signed gradient descent and the Muon optimizer, achieving superior performance in GPT-2 and LLaMA pretraining by surpassing vanilla Muon.

Neural network (NN) training is inherently a large-scale matrix optimization problem, yet the matrix structure of NN parameters has long been overlooked. Recently, the optimizer Muon \cite{jordanmuon}, which explicitly exploits this structure, has gained significant attention for its strong performance in foundation model training. A key component contributing to Muon's success is matrix orthogonalization. In this paper, we propose {\it low-rank orthogonalization}, which explicitly leverages the low-rank nature of gradients during NN training. Building on this, we propose low-rank matrix-signed gradient descent and a low-rank variant of Muon. Our numerical experiments demonstrate the superior performance of low-rank orthogonalization, with the low-rank Muon achieving promising results in GPT-2 and LLaMA pretraining -- surpassing the performance of the carefully tuned vanilla Muon. Theoretically, we establish the iteration complexity of the low-rank matrix-signed gradient descent for finding an approximate stationary solution, as well as that of low-rank Muon for finding an approximate stochastic stationary solution under heavy-tailed noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes