CLApr 26, 2020

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Schütze

arXiv:2004.12406v231.91034 citations

Originality Incremental advance

AI Analysis

This provides an efficient alternative to finetuning for NLP practitioners needing to handle multiple tasks simultaneously, though it is incremental as it builds on existing pretrained models.

The paper tackles the problem of efficiently using pretrained language models by learning selective binary masks instead of finetuning weights, achieving performance comparable to finetuning with a much smaller memory footprint for multi-task inference.

We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

View on arXiv PDF

Similar