CLNov 4, 2019

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

arXiv:1911.01456v262 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient automatic evaluation metrics in open-domain dialogue systems, offering a potential real-time feedback mechanism for training better models, though it is incremental as it builds on prior engagement-focused methods.

The paper tackles the problem of evaluating open-domain dialogue systems by proposing a novel metric called predictive engagement, which estimates utterance-level engagement and aggregates it to predict conversation-level engagement, showing high human agreement and correlation with human judgments.

User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes