CVLGAug 11, 2025

Exploiting Layer Normalization Fine-tuning in Visual Transformer Foundation Models for Classification

arXiv:2508.07577v12 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing LayerNorm fine-tuning in Vision Transformers for researchers and practitioners dealing with data scarcity and domain shifts in transfer learning, though it appears incremental as it builds on existing LayerNorm mechanisms.

This paper investigates how LayerNorm parameter shifts during fine-tuning of Vision Transformers indicate domain transitions between source and target data, showing that effectiveness depends on how well training samples represent the target domain, quantified by a Fine-tuning Shift Ratio (FSR). The authors propose a rescaling mechanism using a scalar λ correlated with FSR to align LayerNorm shifts with ideal ones, achieving improved performance across natural and pathological image datasets in both in-distribution and out-of-distribution settings.

LayerNorm is pivotal in Vision Transformers (ViTs), yet its fine-tuning dynamics under data scarcity and domain shifts remain underexplored. This paper shows that shifts in LayerNorm parameters after fine-tuning (LayerNorm shifts) are indicative of the transitions between source and target domains; its efficacy is contingent upon the degree to which the target training samples accurately represent the target domain, as quantified by our proposed Fine-tuning Shift Ratio ($FSR$). Building on this, we propose a simple yet effective rescaling mechanism using a scalar $λ$ that is negatively correlated to $FSR$ to align learned LayerNorm shifts with those ideal shifts achieved under fully representative data, combined with a cyclic framework that further enhances the LayerNorm fine-tuning. Extensive experiments across natural and pathological images, in both in-distribution (ID) and out-of-distribution (OOD) settings, and various target training sample regimes validate our framework. Notably, OOD tasks tend to yield lower $FSR$ and higher $λ$ in comparison to ID cases, especially with scarce data, indicating under-represented target training samples. Moreover, ViTFs fine-tuned on pathological data behave more like ID settings, favoring conservative LayerNorm updates. Our findings illuminate the underexplored dynamics of LayerNorm in transfer learning and provide practical strategies for LayerNorm fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes