AICLDBLGNov 13, 2024

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

arXiv:2411.08599v364 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the problem of generating accurate SQL queries from natural language for database users, representing a strong domain-specific advancement.

The paper tackles the challenge of improving large language model performance in text-to-SQL tasks by introducing XiYan-SQL, a multi-generator ensemble framework that achieves state-of-the-art execution accuracy of 75.63% on Bird, 89.65% on Spider, 69.86% on SQL-Eval, and 41.20% on NL2GQL benchmarks.

To tackle the challenges of large language model performance in natural language to SQL tasks, we introduce XiYan-SQL, an innovative framework that employs a multi-generator ensemble strategy to improve candidate generation. We introduce M-Schema, a semi-structured schema representation method designed to enhance the understanding of database structures. To enhance the quality and diversity of generated candidate SQL queries, XiYan-SQL integrates the significant potential of in-context learning (ICL) with the precise control of supervised fine-tuning. On one hand, we propose a series of training strategies to fine-tune models to generate high-quality candidates with diverse preferences. On the other hand, we implement the ICL approach with an example selection method based on named entity recognition to prevent overemphasis on entities. The refiner optimizes each candidate by correcting logical or syntactical errors. To address the challenge of identifying the best candidate, we fine-tune a selection model to distinguish nuances of candidate SQL queries. The experimental results on multiple dialect datasets demonstrate the robustness of XiYan-SQL in addressing challenges across different scenarios. Overall, our proposed XiYan-SQL achieves the state-of-the-art execution accuracy of 75.63% on Bird benchmark, 89.65% on the Spider test set, 69.86% on SQL-Eval, 41.20% on NL2GQL. The proposed framework not only enhances the quality and diversity of SQL queries but also outperforms previous methods.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes