CLAIMay 27, 2023

Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques

arXiv:2305.17378v1232 citations
Originality Incremental advance
AI Analysis

This work addresses generalization challenges in semantic parsing for applications like database querying, but it is incremental as it builds on existing LM methods with simple modifications.

The paper tackles the problem of compositional and domain generalization in language model-based text-to-SQL semantic parsing by introducing two simple semantic boundary-based techniques, resulting in substantial performance improvements on two datasets.

Compositional and domain generalization present significant challenges in semantic parsing, even for state-of-the-art semantic parsers based on pre-trained language models (LMs). In this study, we empirically investigate improving an LM's generalization in semantic parsing with two simple techniques: at the token level, we introduce a token preprocessing method to preserve the semantic boundaries of tokens produced by LM tokenizers; at the sequence level, we propose to use special tokens to mark the boundaries of components aligned between input and output. Our experimental results on two text-to-SQL semantic parsing datasets show that our token preprocessing, although simple, can substantially improve the LM performance on both types of generalization, and our component boundary marking method is particularly helpful for compositional generalization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes