CLAIApr 26, 2023

Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval

arXiv:2304.13301v232 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the problem of generating accurate SQL queries from natural language for database users, with incremental improvements in retrieval and prompting techniques.

The paper tackles the text-to-SQL task by proposing an LLM-based framework that uses de-semanticization and skeleton retrieval to improve SQL generation, achieving state-of-the-art performance on three cross-domain benchmarks.

Text-to-SQL is a task that converts a natural language question into a structured query language (SQL) to retrieve information from a database. Large language models (LLMs) work well in natural language generation tasks, but they are not specifically pre-trained to understand the syntax and semantics of SQL commands. In this paper, we propose an LLM-based framework for Text-to-SQL which retrieves helpful demonstration examples to prompt LLMs. However, questions with different database schemes can vary widely, even if the intentions behind them are similar and the corresponding SQL queries exhibit similarities. Consequently, it becomes crucial to identify the appropriate SQL demonstrations that align with our requirements. We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity. We also model the relationships between question tokens and database schema items (i.e., tables and columns) to filter out scheme-related information. Our framework adapts the range of the database schema in prompts to balance length and valuable information. A fallback mechanism allows for a more detailed schema to be provided if the generated SQL query fails. Ours outperforms state-of-the-art models and demonstrates strong generalization ability on three cross-domain Text-to-SQL benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes