CLOct 21, 2022

Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

arXiv:2210.12096v1294 citationsh-index: 69
Originality Incremental advance
AI Analysis

This work addresses data limitations for researchers and practitioners in conversational AI and database querying, but it is incremental as it builds on existing self-play and text-to-SQL methods.

The paper tackles the challenge of training data scarcity and cross-domain generalization in multi-turn text-to-SQL by using self-play to synthesize new conversational interactions, which improves accuracy on SParC and CoSQL datasets.

The task of context-dependent text-to-SQL aims to convert multi-turn user utterances to formal SQL queries. This is a challenging task due to both the scarcity of training data from which to learn complex contextual dependencies and to generalize to unseen databases. In this paper we explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions to adapt the model to new databases. We first design a SQL-to-text model conditioned on a sampled goal query, which represents a user's intent, that then converses with a text-to-SQL semantic parser to generate new interactions. We then filter the synthesized interactions and retrain the models with the augmented data. We find that self-play improves the accuracy of a strong baseline on SParC and CoSQL, two widely used cross-domain text-to-SQL datasets. Our analysis shows that self-play simulates various conversational thematic relations, enhances cross-domain generalization and improves beam-search.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes