LGMar 14, 2025

Asynchronous Sharpness-Aware Minimization For Fast and Accurate Deep Learning

arXiv:2503.11147v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the practical adoption barrier of SAM for real-world deep learning applications by reducing its computational overhead.

The authors tackled the high computational cost of Sharpness-Aware Minimization (SAM) by proposing an asynchronous-parallel version that breaks data dependencies, enabling efficient use of heterogeneous resources like CPUs and GPUs. The method achieves comparable accuracy to original SAM on Vision Transformer fine-tuning (CIFAR-100) with training times similar to SGD.

Sharpness-Aware Minimization (SAM) is an optimization method that improves generalization performance of machine learning models. Despite its superior generalization, SAM has not been actively used in real-world applications due to its expensive computational cost. In this work, we propose a novel asynchronous-parallel SAM which achieves nearly the same gradient norm penalizing effect like the original SAM while breaking the data dependency between the model perturbation and the model update. The proposed asynchronous SAM can even entirely hide the model perturbation time by adjusting the batch size for the model perturbation in a system-aware manner. Thus, the proposed method enables to fully utilize heterogeneous system resources such as CPUs and GPUs. Our extensive experiments well demonstrate the practical benefits of the proposed asynchronous approach. E.g., the asynchronous SAM achieves comparable Vision Transformer fine-tuning accuracy (CIFAR-100) as the original SAM while having almost the same training time as SGD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes