LGOCMLNov 10, 2022

How Does Sharpness-Aware Minimization Minimize Sharpness?

arXiv:2211.05729v274 citationsh-index: 29
Originality Incremental advance
AI Analysis

This resolves theoretical ambiguities for researchers and practitioners using SAM to improve generalization in deep learning, though it is incremental as it builds on existing SAM work.

The paper clarifies the exact sharpness notion regularized by Sharpness-Aware Minimization (SAM), showing that approximations in its original motivation lead to inaccurate conclusions individually but combine to reveal the correct effect with full-batch gradients, and proves that the stochastic version regularizes a different sharpness notion linked to practical performance.

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes