CVLGFeb 24, 2024

Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization

arXiv:2403.08821v12 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck for researchers and practitioners using SAM to improve model generalization, representing an incremental improvement.

The paper tackles the high computational cost of Sharpness-aware Minimization (SAM) by proposing an adaptive sampling method based on the variation of a gradient component, achieving state-of-the-art accuracies comparable to SAM while significantly accelerating training.

Sharpness-aware Minimization (SAM) has been proposed recently to improve model generalization ability. However, SAM calculates the gradient twice in each optimization step, thereby doubling the computation costs compared to stochastic gradient descent (SGD). In this paper, we propose a simple yet efficient sampling method to significantly accelerate SAM. Concretely, we discover that the gradient of SAM is a combination of the gradient of SGD and the Projection of the Second-order gradient matrix onto the First-order gradient (PSF). PSF exhibits a gradually increasing frequency of change during the training process. To leverage this observation, we propose an adaptive sampling method based on the variation of PSF, and we reuse the sampled PSF for non-sampling iterations. Extensive empirical results illustrate that the proposed method achieved state-of-the-art accuracies comparable to SAM on diverse network architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes