CLOct 2, 2025

Exploring Database Normalization Effects on SQL Generation

arXiv:2510.01989v12 citationsh-index: 5CIKM
Originality Incremental advance
AI Analysis

This work addresses the overlooked impact of schema design on NL2SQL systems, offering insights for developers to optimize interfaces based on query types, though it is incremental in focusing on a specific factor within existing methods.

The study systematically examined how database normalization affects SQL generation by evaluating eight large language models on synthetic and real-world datasets with varying normalization levels. Results showed denormalized schemas improved accuracy for simple retrieval queries, while normalized schemas performed better for aggregation queries, with few-shot examples mitigating challenges in normalized cases.

Schema design, particularly normalization, is a critical yet often overlooked factor in natural language to SQL (NL2SQL) systems. Most prior research evaluates models on fixed schemas, overlooking the influence of design on performance. We present the first systematic study of schema normalization's impact, evaluating eight leading large language models on synthetic and real-world datasets with varied normalization levels. We construct controlled synthetic datasets with formal normalization (1NF-3NF) and real academic paper datasets with practical schemes. Our results show that denormalized schemas offer high accuracy on simple retrieval queries, even with cost-effective models in zero-shot settings. In contrast, normalized schemas (2NF/3NF) introduce challenges such as errors in base table selection and join type prediction; however, these issues are substantially mitigated by providing few-shot examples. For aggregation queries, normalized schemas yielded better performance, mainly due to their robustness against the data duplication and NULL value issues that cause errors in denormalized schemas. These findings suggest that the optimal schema design for NL2SQL applications depends on the types of queries to be supported. Our study demonstrates the importance of considering schema design when developing NL2SQL interfaces and integrating adaptive schema selection for real-world scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes