CLSep 25, 2024

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

arXiv:2409.16658v124 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of text hallucination in language models for users relying on accurate AI-generated content, offering a method to enhance faithfulness, though it appears incremental as it builds on known model behaviors.

The paper demonstrates that pre-trained language models produce statistically distinguishable probability and uncertainty distributions for unfaithfully hallucinated texts across 24 models and 6 datasets, with 88-98% of cases showing this pattern. It introduces a hallucination-reducing training algorithm that improves faithfulness metrics while preserving general text quality.

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure. By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions. Using this general phenomenon, we showcase a hallucination-reducing training algorithm. Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes