Global Reasoning over Database Structures for Text-to-SQL Parsing
This addresses the challenge of accurate SQL query generation for unseen databases, which is incremental as it builds on existing state-of-the-art models.
The paper tackles the problem of zero-shot text-to-SQL parsing on complex databases by proposing a semantic parser that globally reasons over database structures to improve selection of database constants, increasing accuracy on the Spider dataset from 39.4% to 47.4%.
State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time. When tested against complex databases that are unobserved at training time (zero-shot), the parser often struggles to select the correct set of database constants in the new database, due to the local nature of decoding. In this work, we propose a semantic parser that globally reasons about the structure of the output query to make a more contextually-informed selection of database constants. We use message-passing through a graph neural network to softly select a subset of database constants for the output query, conditioned on the question. Moreover, we train a model to rank queries based on the global alignment of database constants to question words. We apply our techniques to the current state-of-the-art model for Spider, a zero-shot semantic parsing dataset with complex databases, increasing accuracy from 39.4% to 47.4%.