LG AIMay 1

From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity

Yee Zhing Liew, Andrew Huey Ping Tan, Anwar P. P Abdul Majeed

arXiv:2605.0093922.8h-index: 1

Predicted impact top 80% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For LLM developers and users, this provides a more reliable way to detect high-confidence factual errors, improving trustworthiness.

The paper addresses the problem of detecting 'stubborn hallucinations' in LLMs—errors where the model is confidently wrong. The proposed method, EPGS, detects these errors by measuring gradient sensitivity to input perturbations, outperforming entropy-based and representation-based baselines.

Traditional hallucination detection fails on "Stubborn Hallucinations" -- errors where LLMs are confidently wrong. We propose a geometric solution: Embedding-Perturbed Gradient Sensitivity (EPGS). We hypothesize that while robust facts reside in flat minima, stubborn hallucinations sit in sharp minima, supported by brittle memorization. EPGS detects this sharpness by perturbing input embeddings with Gaussian noise and measuring the resulting spike in gradient magnitude. This acts as an efficient proxy for the Hessian spectrum, differentiating stable knowledge from unstable memorization. Our experiments show that EPGS significantly outperforms entropy-based and representation-based baselines, providing a robust signal for detecting high-confidence factual errors.

View on arXiv PDF

Similar