CLApr 12, 2021

Learning to Synthesize Data for Semantic Parsing

arXiv:2104.05827v2737 citations
Originality Incremental advance
AI Analysis

This work addresses data scarcity for semantic parsing, offering a method to generate more diverse training data without handcrafted rules, though it is incremental as it builds on existing PCFG and BART techniques.

The paper tackled the problem of synthesizing diverse data for semantic parsing by proposing a generative model combining a PCFG for program composition and a BART-based translation model, which improved compositional and domain generalization in text-to-SQL parsing on GeoQuery and Spider benchmarks.

Synthesizing data for semantic parsing has gained increasing attention recently. However, most methods require handcrafted (high-precision) rules in their generative process, hindering the exploration of diverse unseen data. In this work, we propose a generative model which features a (non-neural) PCFG that models the composition of programs (e.g., SQL), and a BART-based translation model that maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. Moreover, explicitly modeling compositions using PCFG leads to a better exploration of unseen programs, thus generate more diverse data. We evaluate our method in both in-domain and out-of-domain settings of text-to-SQL parsing on the standard benchmarks of GeoQuery and Spider, respectively. Our empirical results show that the synthesized data generated from our model can substantially help a semantic parser achieve better compositional and domain generalization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes