CLMay 20, 2019

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

arXiv:1905.08205v21210 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of complex text-to-SQL conversion for database users, representing a strong incremental advance in a specific domain.

The paper tackles the problem of generating SQL queries from natural language in cross-domain databases by introducing IRNet, which uses an intermediate representation to bridge intent mismatches and out-of-domain words, achieving 46.7% accuracy on the Spider benchmark with a 19.5% absolute improvement over previous state-of-the-art.

We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes