CLSep 11, 2023

Hi Model, generating 'nice' instead of 'good' is not as bad as generating 'rice'! Towards Context and Semantic Infused Dialogue Generation Loss Function and Evaluation Metric

arXiv:2309.05804v21 citationsh-index: 105
Originality Incremental advance
AI Analysis

This work addresses the problem of outdated evaluation methods in dialogue generation for researchers and practitioners, though it is incremental as it builds on existing dialogue modeling frameworks.

The paper tackles the limitations of lexical-based loss functions and evaluation metrics in dialogue generation by proposing a new loss function (SemTextualLogue) and metric (Dialuation) that incorporate semantics and context. The models trained with SemTextualLogue achieved superior performance, and Dialuation correlated more strongly with human evaluation than traditional metrics.

Over the past two decades, dialogue modeling has made significant strides, moving from simple rule-based responses to personalized and persuasive response generation. However, despite these advancements, the objective functions and evaluation metrics for dialogue generation have remained stagnant. These lexical-based metrics, e.g., cross-entropy and BLEU, have two key limitations: (a) word-to-word matching without semantic consideration: It assigns the same credit for failure to generate "nice" and "rice" for "good", (b) missing context attribute for evaluating the generated response: Even if a generated response is relevant to the ongoing dialogue context, it may still be penalized for not matching the gold utterance provided in the corpus. In this paper, we first investigate these limitations comprehensively and propose a new loss function called Semantic Infused Contextualized diaLogue (SemTextualLogue) loss function. We also formulate an evaluation metric called Dialuation, incorporating both context and semantic relevance. We experimented with both non-pretrained and pre-trained models on two dialogue corpora, encompassing task-oriented and open-domain scenarios. We found that the dialogue generation models trained with SemTextualLogueloss attained superior performance compared to the traditional cross-entropy loss function. The findings establish that the effective training of a dialogue generation model hinges significantly on incorporating semantics and context. This pattern is also mirrored in the introduced Dialuation metric, where the consideration of both context and semantics correlates more strongly with human evaluation compared to traditional metrics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes