CLAILGApr 30, 2020

Investigating Transferability in Pretrained Language Models

arXiv:2004.14975v21020 citations
AI Analysis

This work addresses the problem of understanding transfer learning mechanisms in NLP for researchers, revealing complexities that challenge existing methods, though it is incremental as it builds on prior ablation techniques.

The study investigated how pretrained language models transfer knowledge by using partial reinitialization to ablate layers in BERT, finding that layers with high probing performance on GLUE tasks are not essential for accuracy and that the benefit of pretrained parameters depends heavily on dataset size, with improvements varying from negligible in data-scarce settings to substantial in data-rich ones.

How does language model pretraining help transfer learning? We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance. This method, partial reinitialization, involves replacing different layers of a pretrained model with random weights, then finetuning the entire model on the transfer task and observing the change in performance. This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks. Furthermore, the benefit of using pretrained parameters for a layer varies dramatically with finetuning dataset size: parameters that provide tremendous performance improvement when data is plentiful may provide negligible benefits in data-scarce settings. These results reveal the complexity of the transfer learning process, highlighting the limitations of methods that operate on frozen models or single data samples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes