CLJan 31, 2019

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

arXiv:1901.11196v236.22365 citationsHas Code

Originality Incremental advance

AI Analysis

This provides a practical solution for improving text classification performance, especially in data-scarce scenarios, though it is incremental as it builds on existing augmentation ideas.

The paper tackles the problem of limited training data in text classification by introducing EDA, a set of four simple data augmentation techniques, which on average across five datasets achieved the same accuracy with only 50% of the data as normal training with full datasets.

We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.

View on arXiv PDF Code

Similar