LGDSOCOct 10, 2025

Convergence of optimizers implies eigenvalues filtering at equilibrium

arXiv:2510.09034v13 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizer-induced bias in finding minima for machine learning practitioners, offering incremental insights into eigenvalue filtering mechanisms.

The paper investigates how different optimizers act as eigenvalue filters at convergence, showing that gradient descent avoids sharp minima while Sharpness-Aware Minimization favors wider basins, and proposes two new algorithms that enhance this filtering to promote wider minima, supported by theoretical analysis and numerical experiments on neural networks.

Ample empirical evidence in deep neural network training suggests that a variety of optimizers tend to find nearly global optima. In this article, we adopt the reversed perspective that convergence to an arbitrary point is assumed rather than proven, focusing on the consequences of this assumption. From this viewpoint, in line with recent advances on the edge-of-stability phenomenon, we argue that different optimizers effectively act as eigenvalue filters determined by their hyperparameters. Specifically, the standard gradient descent method inherently avoids the sharpest minima, whereas Sharpness-Aware Minimization (SAM) algorithms go even further by actively favoring wider basins. Inspired by these insights, we propose two novel algorithms that exhibit enhanced eigenvalue filtering, effectively promoting wider minima. Our theoretical analysis leverages a generalized Hadamard--Perron stable manifold theorem and applies to general semialgebraic $C^2$ functions, without requiring additional non-degeneracy conditions or global Lipschitz bound assumptions. We support our conclusions with numerical experiments on feed-forward neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes