CLMar 25, 2022

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

Sarik Ghazarian, Behnam Hedayatnia, Alexandros Papangelis, Yang Liu, Dilek Hakkani-Tur

arXiv:2203.13927v132.2641 citationsh-index: 61Has Code

Originality Incremental advance

AI Analysis

This addresses the cumbersome data collection issue for dialog evaluation metrics, offering a scalable solution for researchers and developers in conversational AI, though it is incremental as it builds on existing model-based metrics.

The paper tackles the problem of automatic evaluation for open-domain dialogs by using sentiment and conversation-ending cues from user utterances as weak supervision, eliminating the need for manual annotations, and shows that the model performs comparably to those trained on human-annotated data while generalizing across spoken and written corpora.

Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.

View on arXiv PDF Code

Similar