CL LGFeb 22, 2023

Data Augmentation for Neural NLP

arXiv:2302.11412v12.16 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This is an incremental overview paper for researchers and practitioners in NLP facing data limitations.

The paper addresses data scarcity in NLP by reviewing state-of-the-art data augmentation methods for neural and transformer-based models, focusing on low-cost approaches to reduce labeling costs.

Data scarcity is a problem that occurs in languages and tasks where we do not have large amounts of labeled data but want to use state-of-the-art models. Such models are often deep learning models that require a significant amount of data to train. Acquiring data for various machine learning problems is accompanied by high labeling costs. Data augmentation is a low-cost approach for tackling data scarcity. This paper gives an overview of current state-of-the-art data augmentation methods used for natural language processing, with an emphasis on methods for neural and transformer-based models. Furthermore, it discusses the practical challenges of data augmentation, possible mitigations, and directions for future research.

View on arXiv PDF

Similar