CLAIFeb 8, 2024

AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

arXiv:2402.05584v1104 citationsh-index: 8EACL
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific issue for low-resource NLP applications by providing an incremental improvement to mitigate semantic damage in rule-based augmentation.

The paper tackles the problem of semantic damage in rule-based text data augmentation methods by adapting AutoAugment, resulting in enhanced performance for existing augmentation methods and improved cutting-edge pre-trained language models.

Text data augmentation is a complex problem due to the discrete nature of sentences. Although rule-based augmentation methods are widely adopted in real-world applications because of their simplicity, they suffer from potential semantic damage. Previous researchers have suggested easy data augmentation with soft labels (softEDA), employing label smoothing to mitigate this problem. However, finding the best factor for each model and dataset is challenging; therefore, using softEDA in real-world applications is still difficult. In this paper, we propose adapting AutoAugment to solve this problem. The experimental results suggest that the proposed method can boost existing augmentation methods and that rule-based methods can enhance cutting-edge pre-trained language models. We offer the source code.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes