LGAIMay 1

From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity

arXiv:2605.0093922.8h-index: 1
Predicted impact top 80% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For LLM developers and users, this provides a more reliable way to detect high-confidence factual errors, improving trustworthiness.

The paper addresses the problem of detecting 'stubborn hallucinations' in LLMs—errors where the model is confidently wrong. The proposed method, EPGS, detects these errors by measuring gradient sensitivity to input perturbations, outperforming entropy-based and representation-based baselines.

Traditional hallucination detection fails on "Stubborn Hallucinations" -- errors where LLMs are confidently wrong. We propose a geometric solution: Embedding-Perturbed Gradient Sensitivity (EPGS). We hypothesize that while robust facts reside in flat minima, stubborn hallucinations sit in sharp minima, supported by brittle memorization. EPGS detects this sharpness by perturbing input embeddings with Gaussian noise and measuring the resulting spike in gradient magnitude. This acts as an efficient proxy for the Hessian spectrum, differentiating stable knowledge from unstable memorization. Our experiments show that EPGS significantly outperforms entropy-based and representation-based baselines, providing a robust signal for detecting high-confidence factual errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes