CLLGJun 17, 2024

A Semantic-Aware Layer-Freezing Approach to Computation-Efficient Fine-Tuning of Language Models

arXiv:2406.11753v37 citations
Originality Incremental advance
AI Analysis

This work addresses the high computational cost of fine-tuning large language models, offering a practical solution for researchers and practitioners, though it is incremental as it builds on parameter-efficient fine-tuning methods.

The paper tackles the problem of reducing computation costs in fine-tuning language models by proposing a semantic-aware layer-freezing approach that identifies where to fine-tune based on layer contributions to loss reduction. The results show it is effective and efficient, outperforming existing baselines across various LMs and datasets.

Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks. However, full finetuning is usually costly. Existing work, such as parameter-efficient finetuning (PEFT), often focuses on \textit{how to finetune} but neglects the issue of \textit{where to finetune}. As a pioneering work on reducing the cost of backpropagation (at the layer level) by answering where to finetune, we conduct a semantic analysis of the LM inference process. We first propose using transition traces of the latent representation to compute deviations (or loss). Then, using a derived formula of scaling law, we estimate the gain of each layer in reducing deviation (or loss). Further, we narrow down the scope for finetuning, and also, study the cost-benefit balance of LM finetuning. We perform extensive experiments across well-known LMs and datasets. The results show that our approach is effective and efficient, and outperforms the existing baselines. Our approach is orthogonal to other techniques for improving finetuning efficiency, such as PEFT methods, offering practical values on LM finetuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes