CL AI LGOct 31, 2022

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao

Baidu

arXiv:2210.17451v215.1191 citationsh-index: 59Has Code

Originality Incremental advance

AI Analysis

This addresses the cost and efficiency challenges for deploying large language models in multiple downstream tasks, representing an incremental improvement over existing parameter-efficient fine-tuning methods.

The paper tackles the problem of high storage and computational costs in fine-tuning large pre-trained language models by proposing AdaMix, a parameter-efficient fine-tuning method that uses a mixture of adaptation modules, achieving state-of-the-art performance on NLU and NLG tasks while tuning only 0.1-0.2% of parameters.

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models. To address this, parameter-efficient fine-tuning (PEFT) techniques were introduced where small trainable components are injected in the PLM and updated during fine-tuning. We propose AdaMix as a general PEFT method that tunes a mixture of adaptation modules -- given the underlying PEFT method of choice -- introduced in each Transformer layer while keeping most of the PLM weights frozen. For instance, AdaMix can leverage a mixture of adapters like Houlsby or a mixture of low rank decomposition matrices like LoRA to improve downstream task performance over the corresponding PEFT methods for fully supervised and few-shot NLU and NLG tasks. Further, we design AdaMix such that it matches the same computational cost and the number of tunable parameters as the underlying PEFT method. By only tuning 0.1-0.2% of PLM parameters, we show that AdaMix outperforms SOTA parameter-efficient fine-tuning and full model fine-tuning for both NLU and NLG tasks.

View on arXiv PDF Code

Similar