CLAILGOct 18, 2022

Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning

arXiv:2210.10041v2296 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses the computational cost issue for practitioners using transfer learning in NLP, offering an incremental improvement over existing methods.

The paper tackles the problem of reducing computation in transfer learning for pretrained language models by selecting which layers to adapt based on hidden state variability, achieving performance that often matches full fine-tuning or adapter-tuning on the GLUE benchmark.

While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on which subset of the layers to adapt and where to place the classifier. The goal is to reduce the computation cost of transfer learning methods (e.g. fine-tuning or adapter-tuning) without sacrificing its performance. We propose to select layers based on the variability of their hidden states given a task-specific corpus. We say a layer is already "well-specialized" in a task if the within-class variability of its hidden states is low relative to the between-class variability. Our variability metric is cheap to compute and doesn't need any training or hyperparameter tuning. It is robust to data imbalance and data scarcity. Extensive experiments on the GLUE benchmark demonstrate that selecting layers based on our metric can yield significantly stronger performance than using the same number of top layers and often match the performance of fine-tuning or adapter-tuning the entire language model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes