Prefix-to-SQL: Text-to-SQL Generation from Incomplete User Questions
This addresses a practical issue for lay-users interacting with database systems by enabling processing of incomplete questions, though it is incremental as it builds on existing text-to-SQL methods.
The paper tackles the problem of text-to-SQL generation from incomplete user questions, proposing a new task called prefix-to-SQL and constructing a benchmark PAGSAS with 124K prefixes. Experimental results show that curriculum learning improves recall scores by up to 9% on sub-tasks like GeoQuery.
Existing text-to-SQL research only considers complete questions as the input, but lay-users might strive to formulate a complete question. To build a smarter natural language interface to database systems (NLIDB) that also processes incomplete questions, we propose a new task, prefix-to-SQL which takes question prefix from users as the input and predicts the intended SQL. We construct a new benchmark called PAGSAS that contains 124K user question prefixes and the intended SQL for 5 sub-tasks Advising, GeoQuery, Scholar, ATIS, and Spider. Additionally, we propose a new metric SAVE to measure how much effort can be saved by users. Experimental results show that PAGSAS is challenging even for strong baseline models such as T5. As we observe the difficulty of prefix-to-SQL is related to the number of omitted tokens, we incorporate curriculum learning of feeding examples with an increasing number of omitted tokens. This improves scores on various sub-tasks by as much as 9% recall scores on sub-task GeoQuery in PAGSAS.