CLFeb 16, 2024

Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL

arXiv:2402.10663v331 citationsh-index: 15EMNLP
Originality Incremental advance
AI Analysis

This work addresses the issue of demonstration diversity for researchers and practitioners in text-to-SQL, offering an incremental improvement over existing methods.

The paper tackles the problem of insufficient diversity and high labeling overhead in human-labeled demonstration pools for text-to-SQL in-context learning by proposing a human-free iterative fusion method to build a high-diversity demonstration pool, achieving average improvements of 3.2% and 5.0% on mainstream datasets.

Currently, the in-context learning method based on large language models (LLMs) has become the mainstream of text-to-SQL research. Previous works have discussed how to select demonstrations related to the user question from a human-labeled demonstration pool. However, human labeling suffers from the limitations of insufficient diversity and high labeling overhead. Therefore, in this paper, we discuss how to measure and improve the diversity of the demonstrations for text-to-SQL. We present a metric to measure the diversity of the demonstrations and analyze the insufficient of the existing labeled data by experiments. Based on the above discovery, we propose fusing iteratively for demonstrations (Fused) to build a high-diversity demonstration pool through human-free multiple-iteration synthesis, improving diversity and lowering label cost. Our method achieves an average improvement of 3.2% and 5.0% with and without human labeling on several mainstream datasets, which proves the effectiveness of Fused.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes