CLAug 11, 2020

Hybrid Ranking Network for Text-to-SQL

arXiv:2008.04759v193 citations
Originality Incremental advance
AI Analysis

This addresses inefficiencies in Text-to-SQL for database query generation, though it is incremental as it builds on existing pre-trained models.

The paper tackles the problem of underutilizing pre-trained language models in Text-to-SQL by proposing HydraNet, which breaks the task into column-wise ranking and decoding, achieving top performance on the WikiSQL leaderboard.

In this paper, we study how to leverage pre-trained language models in Text-to-SQL. We argue that previous approaches under utilize the base language models by concatenating all columns together with the NL question and feeding them into the base language model in the encoding stage. We propose a neat approach called Hybrid Ranking Network (HydraNet) which breaks down the problem into column-wise ranking and decoding and finally assembles the column-wise outputs into a SQL query by straightforward rules. In this approach, the encoder is given a NL question and one individual column, which perfectly aligns with the original tasks BERT/RoBERTa is trained on, and hence we avoid any ad-hoc pooling or additional encoding layers which are necessary in prior approaches. Experiments on the WikiSQL dataset show that the proposed approach is very effective, achieving the top place on the leaderboard.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes