CLAug 30, 2023

MerA: Merging Pretrained Adapters For Few-Shot Learning

arXiv:2308.15982v112 citationsh-index: 36
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient and effective few-shot learning for NLP practitioners by offering a method that reduces parameters and costs while improving performance, though it is incremental relative to prior adapter-based approaches.

The paper tackles the problem of subpar few-shot learning performance in adapter tuning by proposing MerA, a method that merges pretrained adapters into a single model, achieving substantial improvements over existing methods and further gains with a 'same-track' setting, e.g., 3.5% in MRPC and 5.0% in MNLI.

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes