LG OCJan 19, 2023

An SDE for Modeling SAM: Theory and Insights

Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto, Frank Norbert Proske, Hans Kersting, Aurelien Lucchi

arXiv:2301.08203v318.025 citationsh-index: 44

Originality Incremental advance

AI Analysis

This work offers theoretical insights into SAM's behavior, which is incremental but clarifies mechanisms for improving optimization in machine learning.

The paper derived continuous-time stochastic differential equation (SDE) models for the SAM optimizer and its variants, providing a rigorous approximation that explains SAM's preference for flat minima through implicit regularization and attraction to saddle points under certain conditions.

We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.

View on arXiv PDF

Similar