CLAIOct 28, 2023

Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation

Amazon
arXiv:2310.18794v36 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the problem of unreliable responses in AI dialogue systems for users, though it is incremental as it builds on existing decoding-time mitigation approaches.

The paper tackles hallucination in knowledge-grounded dialogue generation by proposing sequence-level certainty as a theme, showing that higher certainty correlates with lower hallucination, and introduces Certainty-based Response Ranking (CRR) methods that reduce hallucination by up to 15% in experiments across multiple datasets and models.

In this work, we propose sequence-level certainty as a common theme over hallucination in Knowledge Grounded Dialogue Generation (KGDG). We explore the correlation between the level of hallucination in model responses and two types of sequence-level certainty: probabilistic certainty and semantic certainty. Empirical results reveal that higher levels of both types of certainty in model responses are correlated with lower levels of hallucination. We further propose Certainty-based Response Ranking (CRR), a decoding-time hallucination mitigation method that samples several response candidates, ranks them based on sequence-level certainty, and outputs the response with the highest certainty level. Aligning with our definitions of sequence-level certainty, we design 2 types of CRR approaches: Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR). P-CRR ranks individually sampled model responses using the arithmetic mean log-probability of the entire sequence. S-CRR approaches certainty estimation from meaning-space, and ranks model response candidates based on their semantic certainty level as measured by an entailment-based Agreement Score (AS). Through extensive experiments across 3 KGDG datasets, 3 decoding methods, and 4 KGDG models, we validate the effectiveness of CRR for reducing hallucination in KGDG task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes