CLNov 3, 2020

Generating Synthetic Data for Task-Oriented Semantic Parsing with Hierarchical Representations

arXiv:2011.02050v131.0996 citations

Originality Incremental advance

AI Analysis

This work addresses the data scarcity issue for task-oriented semantic parsing in conversational AI, offering a method to reduce labeling costs for new domains, though it is incremental as it builds on existing models like BART.

The paper tackles the problem of generating synthetic data for training semantic parsers with hierarchical representations, where labeled data is scarce or expensive, by using a pretrained BART model to create utterances from templates and filtering them with an auxiliary parser, achieving results evaluated on the Facebook TOP dataset.

Modern conversational AI systems support natural language understanding for a wide variety of capabilities. While a majority of these tasks can be accomplished using a simple and flat representation of intents and slots, more sophisticated capabilities require complex hierarchical representations supported by semantic parsing. State-of-the-art semantic parsers are trained using supervised learning with data labeled according to a hierarchical schema which might be costly to obtain or not readily available for a new domain. In this work, we explore the possibility of generating synthetic data for neural semantic parsing using a pretrained denoising sequence-to-sequence model (i.e., BART). Specifically, we first extract masked templates from the existing labeled utterances, and then fine-tune BART to generate synthetic utterances conditioning on the extracted templates. Finally, we use an auxiliary parser (AP) to filter the generated utterances. The AP guarantees the quality of the generated data. We show the potential of our approach when evaluating on the Facebook TOP dataset for navigation domain.

View on arXiv PDF

Similar