LG DB PLMay 18, 2025

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

Han Weng, Puzhen Wu, Longjie Cui, Yi Zhan, Boyi Liu, Yuanfeng Song, Dun Zeng, Yingxiang Yang, Qianru Zhang, Dong Huang, Xiaoming Yin, Yang Sun

arXiv:2505.12380v316.98 citationsh-index: 3Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses scalability issues in RL pipelines for Text-to-SQL, offering a more efficient solution for database query generation, though it is incremental as it builds on existing RL approaches.

The paper tackled the inefficiency of existing reinforcement learning reward models for Text-to-SQL tasks by proposing Graph-Reward-SQL, which uses graph matching and stepwise rewards to reduce execution latency and GPU memory usage, achieving consistent performance improvements on benchmarks like Spider and BIRD.

Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley-Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly hinder the efficiency and scalability of RL pipelines. To this end, we propose a novel reward model framework for RL-based Text-to-SQL named Graph-Reward-SQL, which employs the GMNScore outcome reward model. We leverage SQL graph representations to provide accurate reward signals while significantly reducing time cost and GPU memory usage. Building on this foundation, we further introduce StepRTM, a stepwise reward model that provides intermediate supervision over Common Table Expression (CTE) subqueries. This encourages both functional correctness and readability of SQL. Extensive comparative and ablation experiments on standard benchmarks, including Spider and BIRD, demonstrate that our method consistently outperforms existing reward models.

View on arXiv PDF Code

Similar