StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing
This work addresses the bottleneck of expensive annotation in semantic parsing for natural language processing applications, though it is incremental as it builds on existing VAE methods.
The paper tackles the problem of limited labeled data in semantic parsing by introducing StructVAE, a variational auto-encoding model that uses tree-structured latent variables to learn from both labeled and unlabeled natural language utterances, resulting in improved performance over supervised models on ATIS and Python code generation tasks.
Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures. Annotating NL utterances with their corresponding MRs is expensive and time-consuming, and thus the limited availability of labeled data often becomes the bottleneck of data-driven, supervised models. We introduce StructVAE, a variational auto-encoding model for semisupervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances. StructVAE models latent MRs not observed in the unlabeled data as tree-structured latent variables. Experiments on semantic parsing on the ATIS domain and Python code generation show that with extra unlabeled data, StructVAE outperforms strong supervised models.