CLAug 30, 2021

AEDA: An Easier Data Augmentation Technique for Text Classification

Akbar Karimi, Leonardo Rossi, Andrea Prati

arXiv:2108.13230v131.2664 citationsHas Code

Originality Incremental advance

AI Analysis

This is an incremental improvement for researchers and practitioners in NLP, offering a simpler data augmentation method that avoids information loss compared to prior work.

The paper tackles the problem of data augmentation for text classification by proposing AEDA, a technique that inserts random punctuation marks into text, and shows it outperforms the EDA method across five datasets.

This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the performance on text classification tasks. AEDA includes only random insertion of punctuation marks into the original text. This is an easier technique to implement for data augmentation than EDA method (Wei and Zou, 2019) with which we compare our results. In addition, it keeps the order of the words while changing their positions in the sentence leading to a better generalized performance. Furthermore, the deletion operation in EDA can cause loss of information which, in turn, misleads the network, whereas AEDA preserves all the input information. Following the baseline, we perform experiments on five different datasets for text classification. We show that using the AEDA-augmented data for training, the models show superior performance compared to using the EDA-augmented data in all five datasets. The source code is available for further study and reproduction of the results.

View on arXiv PDF Code

Similar