LG CVJun 7, 2023

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Maximilian Mueller, Tiffany Vlaar, David Rolnick, Matthias Hein

arXiv:2306.04226v220.735 citationsh-index: 56Has Code

Originality Incremental advance

AI Analysis

This work provides an efficient method for enhancing generalization in deep learning, but it is incremental as it builds on existing SAM techniques.

The paper tackled the problem of improving generalization in neural networks by showing that perturbing only the normalization layer parameters (0.1% of total) in Sharpness-Aware Minimization (SAM) can outperform perturbing all parameters, achieving better performance across ResNet and Vision Transformer architectures.

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness.

View on arXiv PDF Code

Similar