CLAIDBJul 3, 2024

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

arXiv:2407.03227v234 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses challenges in deploying business intelligence solutions with large commercial database schemata, but appears incremental as it builds on existing retrieval-augmented generation methods.

The paper tackles the problem of Text-to-SQL semantic parsing by proposing ASTReS, which uses abstract syntax trees and schema pruning to improve retrieval-augmented generation, showing improvements over state-of-the-art baselines on monolingual and cross-lingual benchmarks.

We focus on Text-to-SQL semantic parsing from the perspective of retrieval-augmented generation. Motivated by challenges related to the size of commercial database schemata and the deployability of business intelligence solutions, we propose $\text{ASTReS}$ that dynamically retrieves input database information and uses abstract syntax trees to select few-shot examples for in-context learning. Furthermore, we investigate the extent to which an in-parallel semantic parser can be leveraged for generating approximated versions of the expected SQL queries, to support our retrieval. We take this approach to the extreme--we adapt a model consisting of less than $500$M parameters, to act as an extremely efficient approximator, enhancing it with the ability to process schemata in a parallelised manner. We apply $\text{ASTReS}$ to monolingual and cross-lingual benchmarks for semantic parsing, showing improvements over state-of-the-art baselines. Comprehensive experiments highlight the contribution of modules involved in this retrieval-augmented generation setting, revealing interesting directions for future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes