LGJun 13, 2022

Towards Understanding Sharpness-Aware Minimization

arXiv:2206.06232v1196 citationsh-index: 18Has Code
Originality Incremental advance
AI Analysis

This work provides foundational insights into SAM's generalization mechanisms, benefiting researchers in machine learning optimization, though it is incremental in building on existing SAM theory.

The authors tackled the incomplete theoretical justifications for Sharpness-Aware Minimization (SAM) by analyzing its implicit bias in diagonal linear networks, proving it leads to better generalization than standard gradient descent, with empirical validation on non-linear networks showing significant improvements.

Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. Moreover, there are no explanations for the success of using $m$-sharpness in SAM which has been shown as essential for generalization. To better understand this aspect of SAM, we theoretically analyze its implicit bias for diagonal linear networks. We prove that SAM always chooses a solution that enjoys better generalization properties than standard gradient descent for a certain class of problems, and this effect is amplified by using $m$-sharpness. We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/tml-epfl/understanding-sam.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes