LGOCMLNov 29, 2023

Critical Influence of Overparameterization on Sharpness-aware Minimization

arXiv:2311.17539v53 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses a gap in understanding SAM's effectiveness for deep learning practitioners, but it is incremental as it builds on existing SAM theory.

The paper investigates how overparameterization affects Sharpness-Aware Minimization (SAM), finding that SAM consistently benefits from overparameterization, with faster convergence and more uniform Hessian moments compared to SGD.

Sharpness-Aware Minimization (SAM) has attracted considerable attention for its effectiveness in improving generalization in deep neural network training by explicitly minimizing sharpness in the loss landscape. Its success, however, relies on the assumption that there exists sufficient variability of flatness in the solution space-a condition commonly facilitated by overparameterization. Yet, the interaction between SAM and overparameterization has not been thoroughly investigated, leaving a gap in understanding precisely how overparameterization affects SAM. Thus, in this work, we analyze SAM under varying degrees of overparameterization, presenting both empirical and theoretical findings that reveal its critical influence on SAM's effectiveness. First, we conduct extensive numerical experiments across diverse domains, demonstrating that SAM consistently benefits from overparameterization. Next, we attribute this phenomenon to the interplay between the enlarged solution space and increased implicit bias resulting from overparameterization. Furthermore, we show that this effect is particularly pronounced in practical settings involving label noise and sparsity, and yet, sufficient regularization is necessary. Last but not least, we provide other theoretical insights into how overparameterization helps SAM achieve minima with more uniform Hessian moments compared to SGD, and much faster convergence at a linear rate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes