LGMLFeb 18

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

arXiv:2602.16340v12 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work provides theoretical insights into optimization behavior for researchers in machine learning, but it is incremental as it extends existing results on steepest descent and momentum-based optimizers.

The paper tackles the problem of understanding the implicit bias of momentum-based optimizers like Adam and Muon on smooth homogeneous neural networks, showing that these algorithms bias towards KKT points of margin maximization problems, with experiments confirming the optimizer choice determines the margin maximized.

We study the implicit bias of momentum-based optimizers on homogeneous models. We first extend existing results on the implicit bias of steepest descent in homogeneous models to normalized steepest descent with an optional learning rate schedule. We then show that for smooth homogeneous models, momentum steepest descent algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are approximate steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms too have a bias towards KKT points of the corresponding margin maximization problem. We extend the analysis to Adam (without the stability constant), which maximizes the $\ell_\infty$ margin, and to Muon-Signum and Muon-Adam, which maximize a hybrid norm. Our experiments corroborate the theory and show that the identity of the margin maximized depends on the choice of optimizer. Overall, our results extend earlier lines of work on steepest descent in homogeneous models and momentum-based optimizers in linear models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes