CLAIOct 30, 2022

Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

Tencent
arXiv:2210.16838v1294 citationsh-index: 65
Originality Incremental advance
AI Analysis

This addresses the labor-intensive challenge of collecting diverse dialogue datasets for open-domain dialogue systems, though it is incremental as it builds on existing data augmentation techniques.

The paper tackles the problem of constructing open-domain dialogue systems by proposing a data augmentation method that automatically generates high-quality, semantically diverse responses through counterfactual inference and selection, outperforming baselines on multiple downstream tasks.

The construction of open-domain dialogue systems requires high-quality dialogue datasets. The dialogue data admits a wide variety of responses for a given dialogue history, especially responses with different semantics. However, collecting high-quality such a dataset in most scenarios is labor-intensive and time-consuming. In this paper, we propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference. Specifically, given an observed dialogue, our counterfactual generation model first infers semantically different responses by replacing the observed reply perspective with substituted ones. Furthermore, our data selection method filters out detrimental augmented responses. Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes