LGMLOct 18, 2024

Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

ETH Zurich
arXiv:2410.14802v112 citationsh-index: 7NIPS
Originality Incremental advance
AI Analysis

This work addresses computational efficiency and generalization for finetuning large language models with methods like LoRA, representing an incremental improvement over SAM.

The paper tackled the problem of improving generalization in scale-invariant deep learning tasks by analyzing the implicit regularization of Sharpness-Aware Minimization (SAM), introducing a concept called balancedness. The result was a new variant, Balancedness-Aware Regularization (BAR), which reduces computational overhead by 95% while enhancing test performance on models like RoBERTa, GPT2, and OPT-1.3B.

Sharpness-aware minimization (SAM) improves generalization of various deep learning tasks. Motivated by popular architectures such as LoRA, we explore the implicit regularization of SAM for scale-invariant problems involving two groups of variables. Instead of focusing on commonly used sharpness, this work introduces a concept termed balancedness, defined as the difference between the squared norm of two variables. This allows us to depict richer global behaviors of SAM. In particular, our theoretical and empirical findings reveal that i) SAM promotes balancedness; and ii) the regularization on balancedness is data-responsive -- outliers have stronger impact. The latter coincides with empirical observations that SAM outperforms SGD in the presence of outliers. Leveraging the implicit regularization, we develop a resource-efficient SAM variant, balancedness-aware regularization (BAR), tailored for scale-invariant problems such as finetuning language models with LoRA. BAR saves 95% computational overhead of SAM, with enhanced test performance across various tasks on RoBERTa, GPT2, and OPT-1.3B.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes