CLOct 16, 2023

Contextual Data Augmentation for Task-Oriented Dialog Systems

arXiv:2310.10380v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the data scarcity problem for researchers and developers building task-oriented dialog systems, representing an incremental but practical improvement over existing augmentation methods.

The paper tackles the bottleneck of limited annotated dialog data for task-oriented dialog systems by developing a contextual data augmentation model that generates user turns conditioned on full dialog context, achieving up to 8% improvement in dialog success rate on MultiWoZ and SGD benchmarks.

Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if similar generative models can be used to generate a large variety of, and often unexpected, user inputs that real dialog systems encounter in practice. Existing data augmentation techniques such as paraphrase generation do not take the dialog context into consideration. In this paper, we develop a novel dialog augmentation model that generates a user turn, conditioning on full dialog context. Additionally, with a new prompt design for language model, and output re-ranking, the dialogs generated from our model can be directly used to train downstream dialog systems. On common benchmark datasets MultiWoZ and SGD, we show that our dialog augmentation model generates high quality dialogs and improves dialog success rate by as much as $8\%$ over baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes