CLJun 29, 2017

Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

arXiv:1706.09799v1231 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the evaluation problem for researchers in dialogue systems, but it is incremental as it confirms an expected hypothesis.

The study investigated whether automated metrics like BLEU correlate better with human judgment in task-oriented dialogue generation than in non-task-oriented settings, finding stronger correlations, especially with multiple reference sentences, and highlighted the need for more challenging datasets.

Automated metrics such as BLEU are widely used in the machine translation literature. They have also been used recently in the dialogue community for evaluating dialogue response generation. However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting. Task-oriented dialogue responses are expressed on narrower domains and exhibit lower diversity. It is thus reasonable to think that these automated metrics would correlate well with human judgment in the task-oriented setting where the generation task consists of translating dialogue acts into a sentence. We conduct an empirical study to confirm whether this is the case. Our findings indicate that these automated metrics have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task-oriented setting. We also observe that these metrics correlate even better for datasets which provide multiple ground truth reference sentences. In addition, we show that some of the currently available corpora for task-oriented language generation can be solved with simple models and advocate for more challenging datasets.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes