AIApr 27, 2023

ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT

Solomon Ubani, Suleyman Olcay Polat, Rodney Nielsen

arXiv:2304.14334v123.866 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses data augmentation for machine learning practitioners in low-resource settings, presenting an incremental improvement over existing methods.

The paper tackles the problem of data scarcity in low-resource scenarios by using ChatGPT to generate synthetic training data, showing that task-specific prompts outperform existing data augmentation approaches.

In this paper, we investigate the use of data obtained from prompting a large generative language model, ChatGPT, to generate synthetic training data with the aim of augmenting data in low resource scenarios. We show that with appropriate task-specific ChatGPT prompts, we outperform the most popular existing approaches for such data augmentation. Furthermore, we investigate methodologies for evaluating the similarity of the augmented data generated from ChatGPT with the aim of validating and assessing the quality of the data generated.

View on arXiv PDF

Similar