CLJul 7, 2022

Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation

Zejiang Hou, Julian Salazar, George Polovets

Amazon

arXiv:2207.03509v124.6304 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of data- and parameter-efficient adaptation for large language models, which is incremental as it builds on existing methods like adapters and structure-learning.

The paper tackles the problem of efficiently adapting large pretrained language models to new tasks or domains by learning the difference between general and adapted models, resulting in improved adaptation time and performance over fine-tuning or domain-adaptive pretraining in experiments on few-shot dialogue completion, low-resource summarization, and multi-domain language modeling.

Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and few examples but limits performance. Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs. This difference is expressed in terms of model weights and sublayer structure through our proposed dynamic low-rank reparameterization and learned architecture controller. Experiments on few-shot dialogue completion, low-resource abstractive summarization, and multi-domain language modeling show improvements in adaptation time and performance over direct finetuning or preparation via domain-adaptive pretraining. Ablations show our task-adaptive reparameterization (TARP) and model search (TAMS) components individually improve on other parameter-efficient transfer like adapters and structure-learning methods like learned sparsification.

View on arXiv PDF Code

Similar