CL AIOct 29, 2025

Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy

arXiv:2510.25378v12 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the issue of unreliable bibliographic recommendations for researchers and users of LLMs, but it is incremental as it builds on prior studies about hallucination and memorization.

The study tackled the problem of LLMs hallucinating non-existent papers in bibliographic recommendation by investigating citation frequency as a proxy for training data redundancy, finding that hallucination rates vary by domain and citation count strongly correlates with factual accuracy, with bibliographic information becoming almost verbatimly memorized beyond about 1,000 citations.

Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in bibliographic recommendation, the hallucination of non-existent papers remains a major issue. Building on prior studies, this study hypothesizes that an LLM's ability to correctly produce bibliographic information depends on whether the underlying knowledge is generated or memorized, with highly cited papers (i.e., more frequently appear in the training corpus) showing lower hallucination rates. We therefore assume citation count as a proxy for training data redundancy (i.e., the frequency with which a given bibliographic record is repeatedly represented in the pretraining corpus) and investigate how citation frequency affects hallucinated references in LLM outputs. Using GPT-4.1, we generated and manually verified 100 bibliographic records across twenty computer-science domains, and measured factual consistency via cosine similarity between generated and authentic metadata. The results revealed that (i) hallucination rates vary across research domains, (ii) citation count is strongly correlated with factual accuracy, and (iii) bibliographic information becomes almost verbatimly memorized beyond approximately 1,000 citations. These findings suggest that highly cited papers are nearly verbatimly retained in the model, indicating a threshold where generalization shifts into memorization.

View on arXiv PDF

Similar