HalluZig: Hallucination Detection using Zigzag Persistence
This addresses the critical barrier of factual unreliability in LLMs for high-stakes domains, offering a novel detection approach.
The paper tackles the problem of hallucination detection in Large Language Models by analyzing the dynamic topology of layer-wise attention evolution, demonstrating that HalluZig outperforms strong baselines on multiple benchmarks.
The factual reliability of Large Language Models (LLMs) remains a critical barrier to their adoption in high-stakes domains due to their propensity to hallucinate. Current detection methods often rely on surface-level signals from the model's output, overlooking the failures that occur within the model's internal reasoning process. In this paper, we introduce a new paradigm for hallucination detection by analyzing the dynamic topology of the evolution of model's layer-wise attention. We model the sequence of attention matrices as a zigzag graph filtration and use zigzag persistence, a tool from Topological Data Analysis, to extract a topological signature. Our core hypothesis is that factual and hallucinated generations exhibit distinct topological signatures. We validate our framework, HalluZig, on multiple benchmarks, demonstrating that it outperforms strong baselines. Furthermore, our analysis reveals that these topological signatures are generalizable across different models and hallucination detection is possible only using structural signatures from partial network depth.