RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL
This work addresses the problem of handling complex databases in text-to-SQL for AI and database users, representing an incremental improvement over existing methods.
The paper tackles the challenge of text-to-SQL tasks for large databases by proposing RB-SQL, a retrieval-based LLM framework that improves performance through schema and example retrieval, achieving better results than baselines on BIRD and Spider datasets.
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.