CLLGApr 16, 2021

Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

arXiv:2104.08268v1801 citations
Originality Incremental advance
AI Analysis

This work addresses data scarcity and performance issues in voice-assistant NLU, though it appears incremental as it builds on existing augmentation methods like synonym replacement and back-translation.

The paper tackles the problem of improving spoken language understanding for voice assistants by introducing a data augmentation technique using byte pair encoding and a BERT-like self-attention model, showing strong performance on domain and intent classification tasks and in user studies on naturalness and semantic similarity.

We introduce a data augmentation technique based on byte pair encoding and a BERT-like self-attention model to boost performance on spoken language understanding tasks. We compare and evaluate this method with a range of augmentation techniques encompassing generative models such as VAEs and performance-boosting techniques such as synonym replacement and back-translation. We show our method performs strongly on domain and intent classification tasks for a voice assistant and in a user-study focused on utterance naturalness and semantic similarity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes