LGCLMLFeb 2, 2019

Parameter-Efficient Transfer Learning for NLP

arXiv:1902.00751v26577 citations
AI Analysis

This addresses the problem of high computational and storage costs for NLP practitioners when deploying many tasks, though it is incremental as it builds on existing transfer learning methods.

The paper tackles the parameter inefficiency of fine-tuning large pre-trained NLP models for multiple downstream tasks by proposing adapter modules, which achieve near state-of-the-art performance while adding only 3.6% parameters per task on GLUE, within 0.4% of full fine-tuning.

Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.4% of the performance of full fine-tuning, adding only 3.6% parameters per task. By contrast, fine-tuning trains 100% of the parameters per task.

Code Implementations17 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes