LGAISep 22, 2025

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

arXiv:2509.18001v22 citationsh-index: 29
Originality Incremental advance
AI Analysis

This provides theoretical understanding and a practical method for improving generalization in deep learning training, though it builds incrementally on existing SAM techniques.

The paper investigated m-sharpness in Sharpness-Aware Minimization (SAM), where performance improves with smaller micro-batch sizes, and introduced Reweighted SAM (RW-SAM) to mimic these benefits while maintaining parallelizability, with experiments validating the approach.

Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. In practice, the empirical m-sharpness effect underpins the deployment of SAM in distributed training, yet a rigorous theoretical account has remained lacking. To provide a theoretical explanation for m-sharpness, we leverage an extended Stochastic Differential Equation (SDE) framework and analyze the structure of stochastic gradient noise (SGN) to characterize the dynamics of various SAM variants, including n-SAM and m-SAM. Our findings reveal that the stochastic noise introduced during SAM perturbations inherently induces a variance-based sharpness regularization effect. Motivated by our theoretical insights, we introduce Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate the effectiveness of our theoretical analysis and proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes