CLMar 5, 2023

Effectiveness of Data Augmentation for Parameter Efficient Tuning with Limited Data

Stephen Obadinma, Hongyu Guo, Xiaodan Zhu

arXiv:2303.02577v226.2223 citationsh-index: 45

Originality Incremental advance

AI Analysis

This work addresses the problem of improving parameter-efficient tuning for NLP practitioners with limited data, though it is incremental as it builds on existing methods.

The study investigated how data augmentation affects parameter-efficient tuning methods like P-tuning and LoRA under data scarcity, finding that while augmentation can boost performance, effectiveness varies and some techniques degrade results, especially with larger models and harder tasks, and adding a contrastive loss improved prefix tuning's performance on augmented data.

Recent work has demonstrated that using parameter efficient tuning techniques such as prefix tuning (or P-tuning) on pretrained language models can yield performance that is comparable or superior to fine-tuning while dramatically reducing trainable parameters. Nevertheless, the effectiveness of such methods under the context of data augmentation, a common strategy to improve learning under low data regimes, has not been fully explored. In this paper, we examine the effectiveness of several popular task-agnostic data augmentation techniques, i.e., EDA, Back Translation, and Mixup, when using two general parameter efficient tuning methods, P-tuning v2 and LoRA, under data scarcity. We show that data augmentation can be used to boost the performance of P-tuning and LoRA models, but the effectiveness of each technique varies and certain methods can lead to a notable degradation in performance, particularly when using larger models and on harder tasks. We further analyze the sentence representations of P-tuning compared to fine-tuning to help understand the above behaviour, and reveal how P-tuning generally presents a more limited ability to separate the sentence embeddings from different classes of augmented data. In addition, it displays poorer performance on heavily altered data. However, we demonstrate that by adding a simple contrastive loss function it can help mitigate such issues for prefix tuning, resulting in sizable improvements to augmented data performance.

View on arXiv PDF

Similar