DL AIMay 26

CiteCheck: Retrieval-Grounded Detection of LLM Citation Hallucinations in Scientific Text

Khashayar Khajavi, Shaghayegh Sadeghi, Rise Adhikari, Alexander Tessier

arXiv:2605.2770075.2h-index: 1

AI Analysis

For researchers and practitioners using LLMs for scientific writing, CiteCheck provides a reliable method to verify citations, addressing the problem of fabricated or corrupted references.

CiteCheck detects citation hallucinations in LLM-generated scientific text by retrieving candidate publications and using a structured LLM verifier, achieving 88.7 macro-F1 and 88.9% accuracy on a physics benchmark, outperforming GPT, Claude, and Gemini baselines.

Large language models (LLMs) are increasingly used to generate scientific reports, but they can produce references that appear plausible while containing corrupted metadata or pointing to papers that do not exist. We introduce CiteCheck, a hybrid framework for citation hallucination detection that verifies whether a citation corresponds to a real scholarly work and whether its metadata is faithful to that work. CiteCheck retrieves candidate publications from external scholarly sources, compares the citation against the retrieved candidate using a structured LLM verifier, and maps verifier scores into three labels: Exact, Minor, and Major. We also construct a 982-citation physics benchmark with controlled corruptions that capture both subtle metadata drift and fully fabricated references. On the held-out test set, CiteCheck achieves 88.7 macro-F1 and 88.9% accuracy, outperforming GPT, Claude, and Gemini baselines, including web-search and few-shot variants. These results show that reliable citation verification benefits from combining scholarly retrieval, structured LLM-based comparison, and calibrated decision rules.

View on arXiv PDF

Similar