CLFeb 16, 2024

Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL

Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

arXiv:2402.10663v316.431 citationsh-index: 15Has CodeEMNLP

Originality Incremental advance

AI Analysis

This work addresses the issue of demonstration diversity for researchers and practitioners in text-to-SQL, offering an incremental improvement over existing methods.

The paper tackles the problem of insufficient diversity and high labeling overhead in human-labeled demonstration pools for text-to-SQL in-context learning by proposing a human-free iterative fusion method to build a high-diversity demonstration pool, achieving average improvements of 3.2% and 5.0% on mainstream datasets.

Currently, the in-context learning method based on large language models (LLMs) has become the mainstream of text-to-SQL research. Previous works have discussed how to select demonstrations related to the user question from a human-labeled demonstration pool. However, human labeling suffers from the limitations of insufficient diversity and high labeling overhead. Therefore, in this paper, we discuss how to measure and improve the diversity of the demonstrations for text-to-SQL. We present a metric to measure the diversity of the demonstrations and analyze the insufficient of the existing labeled data by experiments. Based on the above discovery, we propose fusing iteratively for demonstrations (Fused) to build a high-diversity demonstration pool through human-free multiple-iteration synthesis, improving diversity and lowering label cost. Our method achieves an average improvement of 3.2% and 5.0% with and without human labeling on several mainstream datasets, which proves the effectiveness of Fused.

View on arXiv PDF Code

Similar