LG CL MLFeb 2, 2019

Parameter-Efficient Transfer Learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly

arXiv:1902.00751v262.46941 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of high computational and storage costs for NLP practitioners when deploying many tasks, though it is incremental as it builds on existing transfer learning methods.

The paper tackles the parameter inefficiency of fine-tuning large pre-trained NLP models for multiple downstream tasks by proposing adapter modules, which achieve near state-of-the-art performance while adding only 3.6% parameters per task on GLUE, within 0.4% of full fine-tuning.

Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.4% of the performance of full fine-tuning, adding only 3.6% parameters per task. By contrast, fine-tuning trains 100% of the parameters per task.

View on arXiv PDF Code

Similar