SQLord: A Robust Enterprise Text-to-SQL Solution via Reverse Data Generation and Workflow Decomposition
This addresses the challenge of NL2SQL for business applications where existing methods fail due to complex logic and lack of domain-specific data, though it appears incremental in its approach.
The paper tackles the problem of transforming natural language to SQL queries for enterprise applications by proposing SQLord, which uses reverse data generation and workflow decomposition, achieving over 90% accuracy in online tests and outperforming state-of-the-art baselines.
Transforming natural language into SQL queries (NL2SQL) is crucial for data-driven business applications. Existing frameworks, trained on open-source datasets, struggle with complex business logic and lack domain-specific data for fine-tuning. Additionally, evaluation methods often require annotated data and executable database environments, which are scarce in real-world scenarios. To address these challenges, we propose SQLord, an enterprise-level NL2SQL framework. First, SQLord introduces a data reverse generation approach to convert raw SQL statements into annotated data for supervised fine-tuning (SFT). Second, it proposes a decomposition method for complex queries using an automated workflow generator. Additionally, SQLord features a comprehensive GPT-Judge evaluation framework, including Execution Evaluation (EXE), Query-SQL Evaluation (QSE), and SQL-SQL Evaluation (SSE), tailored to diverse scenarios. Offline tests significantly outperform state of the art baselines, and online accuracy consistently exceeds 90, highlighting SQLord's advantages and effectiveness in complex real world scenarios. SQLord has been successfully applied across multiple scenarios on the world's largest B2B e-commerce platform.