HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment
This addresses the critical accountability challenge for legal practitioners who need verifiable guarantees that AI-generated text faithfully represents source documents, with potential material consequences if errors occur.
The paper tackles the problem of detecting hallucinations in legal AI systems using retrieval-augmented generation (RAG) by introducing HalluGraph, a graph-theoretic framework that quantifies hallucinations through structural alignment between knowledge graphs, achieving an AUC of 0.979 on structured control documents and robust performance on generative legal tasks.
Legal AI systems powered by retrieval-augmented generation (RAG) face a critical accountability challenge: when an AI assistant cites case law, statutes, or contractual clauses, practitioners need verifiable guarantees that generated text faithfully represents source documents. Existing hallucination detectors rely on semantic similarity metrics that tolerate entity substitutions, a dangerous failure mode when confusing parties, dates, or legal provisions can have material consequences. We introduce HalluGraph, a graph-theoretic framework that quantifies hallucinations through structural alignment between knowledge graphs extracted from context, query, and response. Our approach produces bounded, interpretable metrics decomposed into \textit{Entity Grounding} (EG), measuring whether entities in the response appear in source documents, and \textit{Relation Preservation} (RP), verifying that asserted relationships are supported by context. On structured control documents, HalluGraph achieves near-perfect discrimination ($>$400 words, $>$20 entities), HalluGraph achieves $AUC = 0.979$, while maintaining robust performance ($AUC \approx 0.89$) on challenging generative legal task, consistently outperforming semantic similarity baselines. The framework provides the transparency and traceability required for high-stakes legal applications, enabling full audit trails from generated assertions back to source passages.