CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning
This work addresses the problem of generating accurate SQL queries from natural language for database users, representing an incremental improvement over prior methods.
The paper tackles the problem of improving text-to-SQL accuracy by addressing limitations in existing test-time scaling methods like Self-Consistency and Self-Correction, proposing CSC-SQL which integrates them with reinforcement learning, resulting in execution accuracies of 71.72% for a 7B model and 73.67% for a 32B model on the BIRD private test set.
Large language models (LLMs) have demonstrated strong capabilities in translating natural language questions about relational databases into SQL queries. In particular, test-time scaling techniques such as Self-Consistency and Self-Correction can enhance SQL generation accuracy by increasing computational effort during inference. However, these methods have notable limitations: Self-Consistency may select suboptimal outputs despite majority votes, while Self-Correction typically addresses only syntactic errors. To leverage the strengths of both approaches, we propose CSC-SQL, a novel method that integrates Self-Consistency and Self-Correction. CSC-SQL selects the two most frequently occurring outputs from parallel sampling and feeds them into a merge revision model for correction. Additionally, we employ the Group Relative Policy Optimization (GRPO) algorithm to fine-tune both the SQL generation and revision models via reinforcement learning, significantly enhancing output quality. Experimental results confirm the effectiveness and generalizability of CSC-SQL. On the BIRD private test set, our 7B model achieves 71.72\% execution accuracy, while the 32B model achieves 73.67\%. The code has been open sourced at https://github.com/CycloneBoy/csc_sql.