CLMay 8, 2023

DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation

arXiv:2305.04720v2224 citationsHas Code
AI Analysis

This addresses the problem of unreliable evaluation metrics for open-domain dialogue systems, offering a more robust solution for researchers and developers, though it is incremental as it builds on existing neural classifier approaches.

The paper tackles the challenge of evaluating open-domain dialogue systems by proposing DEnsity, a metric that uses density estimation on neural classifier features to assess how likely a response fits human conversation distributions, and experiments show it correlates better with human evaluations than existing metrics.

Despite the recent advances in open-domain dialogue systems, building a reliable evaluation metric is still a challenging problem. Recent studies proposed learnable metrics based on classification models trained to distinguish the correct response. However, neural classifiers are known to make overly confident predictions for examples from unseen distributions. We propose DEnsity, which evaluates a response by utilizing density estimation on the feature space derived from a neural classifier. Our metric measures how likely a response would appear in the distribution of human conversations. Moreover, to improve the performance of DEnsity, we utilize contrastive learning to further compress the feature space. Experiments on multiple response evaluation datasets show that DEnsity correlates better with human evaluations than the existing metrics. Our code is available at https://github.com/ddehun/DEnsity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes