CLJun 27, 2023

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

Microsoft
arXiv:2306.15245v3224 citationsh-index: 24Has Code
Originality Incremental advance
AI Analysis

This addresses the need for better evaluation metrics for chatbots, though it is incremental as it builds on existing methods by replacing a scorer component.

The paper tackled the problem of poor correlation with human evaluations in reference-free turn-level dialogue metrics by proposing a model-agnostic approach using Conditional Pointwise Mutual Information (C-PMI), achieving a 62.6% higher Spearman correlation on average on the FED dataset.

Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 62.6% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes