LGOCJan 19, 2023

An SDE for Modeling SAM: Theory and Insights

arXiv:2301.08203v325 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work offers theoretical insights into SAM's behavior, which is incremental but clarifies mechanisms for improving optimization in machine learning.

The paper derived continuous-time stochastic differential equation (SDE) models for the SAM optimizer and its variants, providing a rigorous approximation that explains SAM's preference for flat minima through implicit regularization and attraction to saddle points under certain conditions.

We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes