CLOct 6, 2022

BootAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

arXiv:2210.02941v20.89 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of insufficient data in NLP for researchers and practitioners, though it is incremental as it builds on existing augmentation methods.

The paper tackles the problem of performance drop in text augmentation on large datasets by proposing BootAug, a hybrid instance-filtering framework that maintains feature space similarity, resulting in a 2-3% improvement in classification accuracy.

Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large public datasets. Our research indicates that existing augmentation methods often generate instances with shifted feature spaces, which leads to a drop in performance on the augmented data (for example, EDA generally loses $\approx 2\%$ in aspect-based sentiment classification). To address this problem, we propose a hybrid instance-filtering framework (BootAug) based on pre-trained language models that can maintain a similar feature space with natural datasets. BootAug is transferable to existing text augmentation methods (such as synonym substitution and back translation) and significantly improves the augmentation performance by $\approx 2-3\%$ in classification accuracy. Our experimental results on three classification tasks and nine public datasets show that BootAug addresses the performance drop problem and outperforms state-of-the-art text augmentation methods. Additionally, we release the code to help improve existing augmentation methods on large datasets.

View on arXiv PDF Code

Similar