AICLNov 14, 2024

Probing LLM Hallucination from Within: Perturbation-Driven Approach via Internal Knowledge

Georgia Tech
arXiv:2411.09689v42 citationsh-index: 48
Originality Highly original
AI Analysis

This addresses the critical challenge of LLM hallucination for practical applications by providing a more accurate detection method without needing external knowledge or supervised training.

The paper tackles the problem of LLM hallucination by introducing a new task called hallucination probing that classifies generated text into aligned, misaligned, and fabricated categories, and proposes SHINE, a method that achieves state-of-the-art performance in hallucination detection, outperforming seven competing methods across four datasets and four LLMs.

LLM hallucination, where unfaithful text is generated, presents a critical challenge for LLMs' practical applications. Current detection methods often resort to external knowledge, LLM fine-tuning, or supervised training with large hallucination-labeled datasets. Moreover, these approaches do not distinguish between different types of hallucinations, which is crucial for enhancing detection performance. To address such limitations, we introduce hallucination probing, a new task that classifies LLM-generated text into three categories: aligned, misaligned, and fabricated. Driven by our novel discovery that perturbing key entities in prompts affects LLM's generation of these three types of text differently, we propose SHINE, a novel hallucination probing method that does not require external knowledge, supervised training, or LLM fine-tuning. SHINE is effective in hallucination probing across three modern LLMs, and achieves state-of-the-art performance in hallucination detection, outperforming seven competing methods across four datasets and four LLMs, underscoring the importance of probing for accurate detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes