SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
This addresses the need for automated, fine-grained evaluation in dialogue systems, offering a novel self-supervised approach that is incremental in improving evaluation accuracy.
The paper tackles the problem of fine-grained dialogue evaluation by proposing a self-supervised framework that models the correlation between turn and dialogue quality, achieving high consistency with human evaluations and outperforming state-of-the-art models on multiple benchmarks.
This paper introduces a novel Self-supervised Fine-grained Dialogue Evaluation framework (SelF-Eval). The core idea is to model the correlation between turn quality and the entire dialogue quality. We first propose a novel automatic data construction method that can automatically assign fine-grained scores for arbitrarily dialogue data. Then we train \textbf{SelF-Eval} with a multi-level contrastive learning schema which helps to distinguish different score levels. Experimental results on multiple benchmarks show that SelF-Eval is highly consistent with human evaluations and better than the state-of-the-art models. We give a detailed analysis of the experiments in this paper. Our code is available on GitHub.