OCAILGDec 10, 2025

The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization

arXiv:2512.09678v12 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses optimization challenges in training large language models, but it is incremental as it builds on existing Muon and Dion methods.

The paper tackles the problem of optimizing weight matrices in large language models by introducing Fanions, a family of algorithms based on duals of Ky Fan k-norms and their combinations with other norms. The result shows that F-Muon and S-Muon match Muon's performance across tasks and outperform it on a synthetic linear least squares problem.

In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm underlying the Muon update, we leverage duals of the Ky Fan $k$-norms to introduce a family of Muon-like algorithms we name Fanions, which are closely related to Dion. By working with duals of convex combinations of the Ky Fan $k$-norms with either the Frobenius norm or the $l_\infty$ norm, we construct the families of F-Fanions and S-Fanions, respectively. Their most prominent members are F-Muon and S-Muon. We complement our theoretical analysis with an extensive empirical study of these algorithms across a wide range of tasks and settings, demonstrating that F-Muon and S-Muon consistently match Muon's performance, while outperforming vanilla Muon on a synthetic linear least squares problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes