LG AISep 22, 2025

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

Haocheng Luo, Mehrtash Harandi, Dinh Phung, Trung Le

arXiv:2509.18001v29.42 citationsh-index: 29

Originality Incremental advance

AI Analysis

This provides theoretical understanding and a practical method for improving generalization in deep learning training, though it builds incrementally on existing SAM techniques.

The paper investigated m-sharpness in Sharpness-Aware Minimization (SAM), where performance improves with smaller micro-batch sizes, and introduced Reweighted SAM (RW-SAM) to mimic these benefits while maintaining parallelizability, with experiments validating the approach.

Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. In practice, the empirical m-sharpness effect underpins the deployment of SAM in distributed training, yet a rigorous theoretical account has remained lacking. To provide a theoretical explanation for m-sharpness, we leverage an extended Stochastic Differential Equation (SDE) framework and analyze the structure of stochastic gradient noise (SGN) to characterize the dynamics of various SAM variants, including n-SAM and m-SAM. Our findings reveal that the stochastic noise introduced during SAM perturbations inherently induces a variance-based sharpness regularization effect. Motivated by our theoretical insights, we introduce Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate the effectiveness of our theoretical analysis and proposed method.

View on arXiv PDF

Similar