CLSep 7, 2018

Data Augmentation for Spoken Language Understanding via Joint Variational Generation

arXiv:1809.02305v293 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of high annotation costs for domain adaptation in SLU, offering a method to alleviate data scarcity, though it appears incremental as it builds on existing latent variable models.

The paper tackled data scarcity in spoken language understanding (SLU) by proposing a joint variational generative model to synthesize fully annotated utterances, resulting in performance gains for existing SLU models as shown through experiments and statistical testing.

Data scarcity is one of the main obstacles of domain adaptation in spoken language understanding (SLU) due to the high cost of creating manually tagged SLU datasets. Recent works in neural text generative models, particularly latent variable models such as variational autoencoder (VAE), have shown promising results in regards to generating plausible and natural sentences. In this paper, we propose a novel generative architecture which leverages the generative power of latent variable models to jointly synthesize fully annotated utterances. Our experiments show that existing SLU models trained on the additional synthetic examples achieve performance gains. Our approach not only helps alleviate the data scarcity issue in the SLU task for many datasets but also indiscriminately improves language understanding performances for various SLU models, supported by extensive experiments and rigorous statistical testing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes