AISep 16, 2025

GBV-SQL: Guided Generation and SQL2Text Back-Translation Validation for Multi-Agent Text2SQL

Daojun Chen, Xi Wang, Shenyuan Ren, Qingzhi Ma, Pengpeng Zhao, An Liu

arXiv:2509.12612v13.3h-index: 2

Originality Highly original

AI Analysis

It addresses the critical issue of benchmark integrity and semantic alignment in Text2SQL for database querying, offering a robust validation framework while highlighting pervasive dataset flaws.

The paper tackles the problem of semantic misinterpretation in Text2SQL generation by proposing GBV-SQL, a multi-agent framework that uses guided generation and SQL2Text back-translation validation, achieving 63.23% execution accuracy on BIRD and up to 97.6% on Spider after removing flawed examples.

While Large Language Models have significantly advanced Text2SQL generation, a critical semantic gap persists where syntactically valid queries often misinterpret user intent. To mitigate this challenge, we propose GBV-SQL, a novel multi-agent framework that introduces Guided Generation with SQL2Text Back-translation Validation. This mechanism uses a specialized agent to translate the generated SQL back into natural language, which verifies its logical alignment with the original question. Critically, our investigation reveals that current evaluation is undermined by a systemic issue: the poor quality of the benchmarks themselves. We introduce a formal typology for "Gold Errors", which are pervasive flaws in the ground-truth data, and demonstrate how they obscure true model performance. On the challenging BIRD benchmark, GBV-SQL achieves 63.23% execution accuracy, a 5.8% absolute improvement. After removing flawed examples, GBV-SQL achieves 96.5% (dev) and 97.6% (test) execution accuracy on the Spider benchmark. Our work offers both a robust framework for semantic validation and a critical perspective on benchmark integrity, highlighting the need for more rigorous dataset curation.

View on arXiv PDF

Similar