CLFeb 25, 2022

PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

arXiv:2202.12499v2644 citations
AI Analysis

This addresses the problem of limited labeled data for natural language understanding tasks, offering an efficient augmentation method that is incremental by building on prompt-based techniques.

The paper tackles data augmentation for low-resource natural language understanding tasks by proposing PromDA, a prompt-based method that trains soft prompts in frozen pre-trained language models to generate synthetic data without needing unlabeled in-domain data, resulting in performance boosts on four benchmarks that consistently outperform competitive baselines, including a state-of-the-art semi-supervised model.

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes