CLJan 4, 2024

Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain Dialogue Systems

Yuma Tsuta, Naoki Yoshinaga, Shoetsu Sato, Masashi Toyoda

arXiv:2401.02256v128.0124 citationsh-index: 18IJCNLP

Originality Incremental advance

AI Analysis

This addresses the need for more accurate automatic evaluation methods in dialogue systems, which is incremental as it builds on existing evaluation challenges.

The study tackled the problem of evaluating open-domain dialogue systems from the interlocutor's perspective, finding that interlocutor awareness is critical for correlating automatic evaluations with human judgments, and that dialogue continuity prediction can train such evaluators without human feedback, though evaluating generated responses remains challenging.

Open-domain dialogue systems have started to engage in continuous conversations with humans. Those dialogue systems are required to be adjusted to the human interlocutor and evaluated in terms of their perspective. However, it is questionable whether the current automatic evaluation methods can approximate the interlocutor's judgments. In this study, we analyzed and examined what features are needed in an automatic response evaluator from the interlocutor's perspective. The first experiment on the Hazumi dataset revealed that interlocutor awareness plays a critical role in making automatic response evaluation correlate with the interlocutor's judgments. The second experiment using massive conversations on X (formerly Twitter) confirmed that dialogue continuity prediction can train an interlocutor-aware response evaluator without human feedback while revealing the difficulty in evaluating generated responses compared to human responses.

View on arXiv PDF

Similar