CL LGApr 16, 2021

Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

Akhila Yerukola, Mason Bretan, Hongxia Jin

arXiv:2104.08268v132.7801 citations

Originality Incremental advance

AI Analysis

This work addresses data scarcity and performance issues in voice-assistant NLU, though it appears incremental as it builds on existing augmentation methods like synonym replacement and back-translation.

The paper tackles the problem of improving spoken language understanding for voice assistants by introducing a data augmentation technique using byte pair encoding and a BERT-like self-attention model, showing strong performance on domain and intent classification tasks and in user studies on naturalness and semantic similarity.

We introduce a data augmentation technique based on byte pair encoding and a BERT-like self-attention model to boost performance on spoken language understanding tasks. We compare and evaluate this method with a range of augmentation techniques encompassing generative models such as VAEs and performance-boosting techniques such as synonym replacement and back-translation. We show our method performs strongly on domain and intent classification tasks for a voice assistant and in a user-study focused on utterance naturalness and semantic similarity.

View on arXiv PDF

Similar