CL AIOct 19, 2022

N-Best Hypotheses Reranking for Text-To-SQL Systems

Lu Zeng, Sree Hari Krishnan Parthasarathi, Dilek Hakkani-Tur

arXiv:2210.10668v14.829 citationsh-index: 61

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing accuracy in text-to-SQL systems for database query generation, though it is incremental as it builds on existing methods with modest gains.

The paper tackled the problem of improving text-to-SQL systems by reranking the N-best hypotheses from a state-of-the-art model, achieving a 1% improvement in exact match accuracy and a ~2.5% improvement in execution accuracy on the Spider dataset, establishing a new state-of-the-art.

Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a $7.7\%$ absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent $1\% $ improvement in EM accuracy, and a $~2.5\%$ improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.

View on arXiv PDF

Similar