LGNEMLSep 21, 2023

Sharpness-Aware Minimization and the Edge of Stability

arXiv:2309.12488v618 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work provides insights into the training dynamics of SAM, a method known to improve generalization, but is incremental as it extends prior analysis from gradient descent to SAM.

The paper investigates the 'edge of stability' phenomenon for Sharpness-Aware Minimization (SAM), deriving a theoretical edge that depends on gradient norm and empirically validating it on three deep learning tasks.

Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $η$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/η$, after which it fluctuates around this value. The quantity $2/η$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes