Enhancing SQL Query Generation with Neurosymbolic Reasoning
This work addresses the challenge of improving SQL query generation for database users by enhancing smaller open-source language models, representing an incremental advance in neurosymbolic methods.
The paper tackles the problem of generating SQL queries by proposing a neurosymbolic architecture that integrates a language model with symbolic modules to catch and correct errors and guide solution tree exploration, resulting in an average 10.9% accuracy increase and 28% runtime reduction, enabling a smaller LM with the tool to outperform a four-times larger LM.
Neurosymbolic approaches blend the effectiveness of symbolic reasoning with the flexibility of neural networks. In this work, we propose a neurosymbolic architecture for generating SQL queries that builds and explores a solution tree using Best-First Search, with the possibility of backtracking. For this purpose, it integrates a Language Model (LM) with symbolic modules that help catch and correct errors made by the LM on SQL queries, as well as guiding the exploration of the solution tree. We focus on improving the performance of smaller open-source LMs, and we find that our tool, Xander, increases accuracy by an average of 10.9% and reduces runtime by an average of 28% compared to the LM without Xander, enabling a smaller LM (with Xander) to outperform its four-times larger counterpart (without Xander).