LG CVDec 21, 2023

CR-SAM: Curvature Regularized Sharpness-Aware Minimization

arXiv:2312.13555v212.315 citationsh-index: 101Has CodeAAAI

Originality Incremental advance

AI Analysis

This work addresses the generalization challenge in deep learning for practitioners, but it is incremental as it builds upon existing SAM methods.

The paper tackled the problem of improving generalization in deep neural networks by addressing the increasing non-linearity of the loss landscape during training, which reduces the effectiveness of Sharpness-Aware Minimization (SAM). The result was the introduction of Curvature Regularized SAM (CR-SAM), which enhanced classification performance on CIFAR and ImageNet datasets for ResNet and Vision Transformer models.

The capacity to generalize to future unseen data stands as one of the utmost crucial attributes of deep neural networks. Sharpness-Aware Minimization (SAM) aims to enhance the generalizability by minimizing worst-case loss using one-step gradient ascent as an approximation. However, as training progresses, the non-linearity of the loss landscape increases, rendering one-step gradient ascent less effective. On the other hand, multi-step gradient ascent will incur higher training cost. In this paper, we introduce a normalized Hessian trace to accurately measure the curvature of loss landscape on {\em both} training and test sets. In particular, to counter excessive non-linearity of loss landscape, we propose Curvature Regularized SAM (CR-SAM), integrating the normalized Hessian trace as a SAM regularizer. Additionally, we present an efficient way to compute the trace via finite differences with parallelism. Our theoretical analysis based on PAC-Bayes bounds establishes the regularizer's efficacy in reducing generalization error. Empirical evaluation on CIFAR and ImageNet datasets shows that CR-SAM consistently enhances classification performance for ResNet and Vision Transformer (ViT) models across various datasets. Our code is available at https://github.com/TrustAIoT/CR-SAM.

View on arXiv PDF Code

Similar