LG AI CLDec 17, 2025

The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems

arXiv:2512.15068v2

Originality Incremental advance

AI Analysis

This work addresses safety-critical issues in AI systems by exposing fundamental limitations in current detection methods, though it is incremental in refining evaluation techniques rather than proposing a new solution.

The paper tackled the problem of unreliable hallucination detection in RAG systems by applying conformal prediction to provide statistical guarantees, revealing that embedding-based methods fail catastrophically on real data with 100% false positive rates, while GPT-4 achieves a 7% false positive rate, proving the task is solvable but challenging.

Retrieval-Augmented Generation (RAG) systems remain susceptible to hallucinations despite grounding in retrieved evidence. While current detection methods leverage embedding similarity and natural language inference (NLI), their reliability in safety-critical settings remains unproven. We apply conformal prediction to RAG hallucination detection, transforming heuristic scores into decision sets with finite-sample coverage guarantees (1-alpha). Using calibration sets of n=600, we demonstrate a fundamental dichotomy: on synthetic hallucinations (Natural Questions), embedding methods achieve 95% coverage with 0% False Positive Rate (FPR). However, on real hallucinations from RLHF-aligned models (HaluEval), the same methods fail catastrophically, yielding 100% FPR at target coverage. We analyze this failure through the lens of distributional tails, showing that while NLI models achieve acceptable AUC (0.81), the "hardest" hallucinations are semantically indistinguishable from faithful responses, forcing conformal thresholds to reject nearly all valid outputs. Crucially, GPT-4 as a judge achieves 7% FPR (95% CI:[3.4%, 13.7%]) on the same data, proving the task is solvable via reasoning but opaque to surface-level semantics--a phenomenon we term the "Semantic Illusion."

View on arXiv PDF

Similar