Word Embedding Perturbation for Sentence Classification
This work addresses overfitting in NLP for researchers and practitioners, but it is incremental as it applies existing noise methods to embeddings.
The authors tackled overfitting in natural language processing by applying data augmentation through perturbing word embeddings with various noise types, resulting in improved performance for baseline models on sentence classification tasks.
In this technique report, we aim to mitigate the overfitting problem of natural language by applying data augmentation methods. Specifically, we attempt several types of noise to perturb the input word embedding, such as Gaussian noise, Bernoulli noise, and adversarial noise, etc. We also apply several constraints on different types of noise. By implementing these proposed data augmentation methods, the baseline models can gain improvements on several sentence classification tasks.