Selective Demonstrations for Cross-domain Text-to-SQL
This work addresses the challenge of enhancing large language models for text-to-SQL tasks in new domains without costly annotations, offering a practical solution for database query generation.
The paper tackles the problem of improving cross-domain text-to-SQL performance without in-domain annotations by proposing ODIS, a demonstration selection framework that uses out-of-domain and synthetic in-domain examples, resulting in execution accuracy improvements of 1.1 and 11.8 points on two datasets.
Large language models (LLMs) with in-context learning have demonstrated impressive generalization capabilities in the cross-domain text-to-SQL task, without the use of in-domain annotations. However, incorporating in-domain demonstration examples has been found to greatly enhance LLMs' performance. In this paper, we delve into the key factors within in-domain examples that contribute to the improvement and explore whether we can harness these benefits without relying on in-domain annotations. Based on our findings, we propose a demonstration selection framework ODIS which utilizes both out-of-domain examples and synthetically generated in-domain examples to construct demonstrations. By retrieving demonstrations from hybrid sources, ODIS leverages the advantages of both, showcasing its effectiveness compared to baseline methods that rely on a single data source. Furthermore, ODIS outperforms state-of-the-art approaches on two cross-domain text-to-SQL datasets, with improvements of 1.1 and 11.8 points in execution accuracy, respectively.