CLAIDBMar 9, 2024

Schema-Aware Multi-Task Learning for Complex Text-to-SQL

arXiv:2403.09706v1h-index: 3
Originality Incremental advance
AI Analysis

This work improves text-to-SQL parsing for complex queries involving multiple tables or columns, which is an incremental advancement in a domain-specific area.

The paper tackles the problem of generating complex SQL queries from natural language by addressing challenges in schema linking and alignment, resulting in a method that outperforms baselines on the Spider benchmark, particularly in hard scenarios.

Conventional text-to-SQL parsers are not good at synthesizing complex SQL queries that involve multiple tables or columns, due to the challenges inherent in identifying the correct schema items and performing accurate alignment between question and schema items. To address the above issue, we present a schema-aware multi-task learning framework (named MTSQL) for complicated SQL queries. Specifically, we design a schema linking discriminator module to distinguish the valid question-schema linkings, which explicitly instructs the encoder by distinctive linking relations to enhance the alignment quality. On the decoder side, we define 6-type relationships to describe the connections between tables and columns (e.g., WHERE_TC), and introduce an operator-centric triple extractor to recognize those associated schema items with the predefined relationship. Also, we establish a rule set of grammar constraints via the predicted triples to filter the proper SQL operators and schema items during the SQL generation. On Spider, a cross-domain challenging text-to-SQL benchmark, experimental results indicate that MTSQL is more effective than baselines, especially in extremely hard scenarios. Moreover, further analyses verify that our approach leads to promising improvements for complicated SQL queries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes