CLJun 2, 2021

DynaEval: Unifying Turn and Dialogue Level Evaluation

arXiv:2106.01112v3719 citations
AI Analysis

This addresses the need for more holistic evaluation metrics in dialogue systems, offering a unified approach that improves upon existing methods, though it is incremental in nature.

The authors tackled the problem of evaluating dialogues by proposing DynaEval, a framework that unifies turn and dialogue-level assessment using graph convolutional networks and contrastive loss, resulting in significant outperformance over state-of-the-art models and strong correlation with human judgments.

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes