CL DLMay 30

Citation Grounding: Detecting and Reducing LLM Citation Hallucinations via Legal Citation Graphs

arXiv:2606.0089825.4

AI Analysis

For legal AI applications, this work provides the first automated metric and mitigation method for citation hallucinations, though it is domain-specific and relies on a single jurisdiction's data.

LLMs hallucinate legal citations at rates of 13-21%, and the authors propose citation grounding (CG) to detect this by verifying citations against a graph of 100.8M Ukrainian court decisions. They also introduce CG-DPO, which reduces hallucinations without human annotation, achieving 98.5% validation accuracy in distinguishing correct from corrupted citations.

Large language models systematically hallucinate legal citations -- fabricating statute references, citing repealed provisions, and confusing jurisdictions -- yet no automated method exists to measure or reduce this behavior at scale. We propose citation grounding (CG), a metric that verifies LLM-generated legal citations against a ground-truth citation graph extracted from 100.8 million Ukrainian court decisions (502 million edges, 21,736 unique statute nodes). CG decomposes into three components -- citation precision (does the cited provision exist?), citation relevance (is it contextually appropriate?), and citation temporality (was it valid at the relevant date?) -- enabling differential diagnosis of hallucination types. Empirical evaluation on 100 Ukrainian legal queries across five systems -- four commercial LLMs via AWS Bedrock (Claude Haiku 4.5, Mistral Pixtral Large, Amazon Nova Pro/Lite) and one RAG-augmented production system -- reveals CG ranging from 0.791 to 0.873, with 13-21% of citations hallucinated. To reduce hallucinations without human annotation, we introduce Citation Grounding DPO (CG-DPO): a method that constructs preference pairs algorithmically by corrupting verified citations from real court decisions via four targeted strategies. On a dataset of 2,244 court decisions, a Qwen2.5-7B-Instruct model fine-tuned with LoRA achieves 98.5% mean validation accuracy in distinguishing correct from corrupted citations (rewards margin +14.9, std < 0.3 pp across 3 seeds). The citation graph, evaluation framework, and CG-DPO dataset are released as open resources.

View on arXiv PDF

Similar