CLAIApr 28, 2017

Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models

arXiv:1704.08966v233 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for neural conversational models by addressing data quality issues in noisy datasets like movie subtitles.

The paper tackled the problem of noisy dialogue corpora by introducing a weighting model that assigns quality scores to training examples, which improved performance on retrieval-based conversational models trained on subtitles.

Neural conversational models require substantial amounts of dialogue data for their parameter estimation and are therefore usually learned on large corpora such as chat forums or movie subtitles. These corpora are, however, often challenging to work with, notably due to their frequent lack of turn segmentation and the presence of multiple references external to the dialogue itself. This paper shows that these challenges can be mitigated by adding a weighting model into the architecture. The weighting model, which is itself estimated from dialogue data, associates each training example to a numerical weight that reflects its intrinsic quality for dialogue modelling. At training time, these sample weights are included into the empirical loss to be minimised. Evaluation results on retrieval-based models trained on movie and TV subtitles demonstrate that the inclusion of such a weighting model improves the model performance on unsupervised metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes