IRCLDBMar 28, 2025

Domain Specific Question to SQL Conversion with Embedded Data Balancing Technique

arXiv:2504.08753v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate SQL generation for domain-specific queries, which is an incremental improvement over existing methods.

The paper tackled the problem of generating accurate SQL queries from domain-specific natural language questions by addressing value recognition errors, which accounted for 29% of failures in existing systems, and achieved a 10.98% improvement in accuracy over the state-of-the-art on the WikiSQL dataset.

The rise of deep learning in natural language processing has fostered the creation of text to structured query language models composed of an encoder and a decoder. Researchers have experimented with various intermediate processing like schema linking, table type aware, value extract. To generate accurate SQL results for the user question. However error analysis performed on the failed cases on these systems shows, 29 percentage of the errors would be because the system was unable to understand the values expressed by the user in their question. This challenge affects the generation of accurate SQL queries, especially when dealing with domain-specific terms and specific value conditions, where traditional methods struggle to maintain consistency and precision. To overcome these obstacles, proposed two intermediations like implementing data balancing technique and over sampling domain-specific queries which would refine the model architecture to enhance value recognition and fine tuning the model for domain-specific questions. This proposed solution achieved 10.98 percentage improvement in accuracy of the model performance compared to the state of the art model tested on WikiSQL dataset. to convert the user question accurately to SQL queries. Applying oversampling technique on the domain-specific questions shown a significant improvement as compared with traditional approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes