CLLGMar 29, 2024

LayerNorm: A key component in parameter-efficient fine-tuning

arXiv:2403.20284v15 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the computational expense of fine-tuning large NLP models like BERT by enabling parameter-efficient methods, though it is incremental as it builds on existing fine-tuning approaches.

The paper identifies output LayerNorm as the most significantly changing component during fine-tuning of BERT models and shows that fine-tuning only LayerNorm achieves comparable or better performance to full fine-tuning on GLUE tasks, with many tasks solvable by tuning a small subset of LayerNorm with minimal degradation.

Fine-tuning a pre-trained model, such as Bidirectional Encoder Representations from Transformers (BERT), has been proven to be an effective method for solving many natural language processing (NLP) tasks. However, due to the large number of parameters in many state-of-the-art NLP models, including BERT, the process of fine-tuning is computationally expensive. One attractive solution to this issue is parameter-efficient fine-tuning, which involves modifying only a minimal segment of the model while keeping the remainder unchanged. Yet, it remains unclear which segment of the BERT model is crucial for fine-tuning. In this paper, we first analyze different components in the BERT model to pinpoint which one undergoes the most significant changes after fine-tuning. We find that output LayerNorm changes more than any other components when fine-tuned for different General Language Understanding Evaluation (GLUE) tasks. Then we show that only fine-tuning the LayerNorm can reach comparable, or in some cases better, performance to full fine-tuning and other parameter-efficient fine-tuning methods. Moreover, we use Fisher information to determine the most critical subset of LayerNorm and demonstrate that many NLP tasks in the GLUE benchmark can be solved by fine-tuning only a small portion of LayerNorm with negligible performance degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes