CLLGNov 8, 2025

MuonAll: Muon Variant for Efficient Finetuning of Large Language Models

arXiv:2511.06086v16 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient finetuning optimizers in machine learning, but it is incremental as it builds on the existing Muon optimizer without demonstrating clear superiority over AdamW.

The paper tackles the problem of adapting the Muon optimizer for efficient finetuning of large language models by introducing MuonAll, which incorporates all parameters into Muon using 2D matrices, and finds that both Muon and MuonAll perform comparably to AdamW across major benchmarks on models up to half a billion parameters.

Muon optimizer has demonstrated robust results in pretraining of language models but its performance in finetuning of existing public pretrained models is not yet explored. Currently, Muon is used along with AdamW introducing a scope of improvement for adopting all parameters inside Muon. We introduce MuonAll, which incorporates all the parameters inside Muon by transforming into 2D matrices. We conduct extensive finetuning experiments across publicly available language models with model sizes upto half billion parameters. Muon and MuonAll perform at par with AdamW across major benchmarks, highlighting their effectiveness as alternative optimizers. We open-source the distributed implementations of Muon and MuonAll, available at https://github.com/Saurabh750/optimizer

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes