AICLOct 11, 2020

Data Agnostic RoBERTa-based Natural Language to SQL Query Generation

arXiv:2010.05243v38 citations
Originality Incremental advance
AI Analysis

This addresses data privacy concerns in NL2SQL tasks by enabling zero-shot learning based only on natural language and table schema, though it is incremental as it does not achieve state-of-the-art results.

The paper tackled the problem of generating SQL queries from natural language without needing actual table data during training, achieving a test set execution accuracy of 76.7%.

Relational databases are among the most widely used architectures to store massive amounts of data in the modern world. However, there is a barrier between these databases and the average user. The user often lacks the knowledge of a query language such as SQL required to interact with the database. The NL2SQL task aims at finding deep learning approaches to solve this problem by converting natural language questions into valid SQL queries. Given the sensitive nature of some databases and the growing need for data privacy, we have presented an approach with data privacy at its core. We have passed RoBERTa embeddings and data-agnostic knowledge vectors into LSTM based submodels to predict the final query. Although we have not achieved state of the art results, we have eliminated the need for the table data, right from the training of the model, and have achieved a test set execution accuracy of 76.7%. By eliminating the table data dependency while training we have created a model capable of zero shot learning based on the natural language question and table schema alone.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes