Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding
This addresses data scarcity for NLU tasks like intent detection and slot filling, but it is incremental as it builds on existing data augmentation approaches with entity awareness.
The paper tackles the challenge of insufficient annotated data for natural language understanding (NLU) tasks by proposing Entity Aware Data Augmentation (EADA), which uses an Entity Aware Syntax Tree (EAST) to generate training instances, resulting in significant outperformance over existing methods in accuracy and generalization on four datasets.
Understanding the intention of the users and recognizing the semantic entities from their sentences, aka natural language understanding (NLU), is the upstream task of many natural language processing tasks. One of the main challenges is to collect a sufficient amount of annotated data to train a model. Existing research about text augmentation does not abundantly consider entity and thus performs badly for NLU tasks. To solve this problem, we propose a novel NLP data augmentation technique, Entity Aware Data Augmentation (EADA), which applies a tree structure, Entity Aware Syntax Tree (EAST), to represent sentences combined with attention on the entity. Our EADA technique automatically constructs an EAST from a small amount of annotated data, and then generates a large number of training instances for intent detection and slot filling. Experimental results on four datasets showed that the proposed technique significantly outperforms the existing data augmentation methods in terms of both accuracy and generalization ability.