CLDBLGMar 15, 2022

Evaluating the Text-to-SQL Capabilities of Large Language Models

Cambridge
arXiv:2204.00498v1152 citationsh-index: 31
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing and improving SQL generation from natural language for database users, showing incremental gains in few-shot settings.

The paper evaluated Codex's Text-to-SQL capabilities, finding it performs well on the Spider benchmark without finetuning and outperforms state-of-the-art models on GeoQuery and Scholar benchmarks with few-shot prompting.

We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model. We find that, without any finetuning, Codex is a strong baseline on the Spider benchmark; we also analyze the failure modes of Codex in this setting. Furthermore, we demonstrate on the GeoQuery and Scholar benchmarks that a small number of in-domain examples provided in the prompt enables Codex to perform better than state-of-the-art models finetuned on such few-shot examples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes