LG DBSep 21, 2022

T5QL: Taming language models for SQL generation

Samuel Arcadinho, David Aparício, Hugo Veiga, António Alegria

arXiv:2209.10254v143.9295 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the challenge of making SQL generation more efficient and reliable for database access, though it is incremental as it builds on existing semantic parsing approaches.

The paper tackled the problem of automatic SQL generation from natural language by addressing the high computational cost and lack of validity guarantees in state-of-the-art methods, achieving a 13 percentage point improvement in benchmark datasets using smaller language models like T5-Base and ensuring always valid SQL output through grammar constraints.

Automatic SQL generation has been an active research area, aiming at streamlining the access to databases by writing natural language with the given intent instead of writing SQL. Current SOTA methods for semantic parsing depend on LLMs to achieve high predictive accuracy on benchmark datasets. This reduces their applicability, since LLMs requires expensive GPUs. Furthermore, SOTA methods are ungrounded and thus not guaranteed to always generate valid SQL. Here we propose T5QL, a new SQL generation method that improves the performance in benchmark datasets when using smaller LMs, namely T5-Base, by 13pp when compared against SOTA methods. Additionally, T5QL is guaranteed to always output valid SQL using a context-free grammar to constrain SQL generation. Finally, we show that dividing semantic parsing in two tasks, candidate SQLs generation and candidate re-ranking, is a promising research avenue that can reduce the need for large LMs.

View on arXiv PDF

Similar