CL AINov 12, 2025

Hallucinate or Memorize? The Two Sides of Probabilistic Learning in Large Language Models

arXiv:2511.08877v13 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the problem of hallucinated references in LLM outputs for researchers and users in academic and citation recommendation contexts, but it is incremental as it builds on prior studies.

This study investigated how citation frequency affects hallucination rates in LLMs when generating bibliographic records, finding that citation count is strongly correlated with factual accuracy and that bibliographic information becomes almost verbatim memorized beyond roughly 1,000 citations.

Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in citation recommendation, the hallucination of non-existent papers remains a major issue. Building on prior studies, this study hypothesizes that an LLM's ability to correctly produce bibliographic records depends on whether the underlying knowledge is generated or memorized, with highly cited papers (i.e., more frequently appear in the pretraining corpus) showing lower hallucination rates. We therefore assume citation count as a proxy for training data redundancy (i.e., the frequency with which a given bibliographic record appears in the pretraining corpus) and investigate how citation frequency affects hallucinated references in LLM outputs. Using GPT-4.1, we generated and manually verified 100 citations across twenty computer-science domains, and measured factual consistency via cosine similarity between generated and authentic metadata. The results revealed that (i) citation count is strongly correlated with factual accuracy, (ii) bibliographic information becomes almost verbatim memorized beyond roughly 1,000 citations, and (iii) memory interference occurs when multiple highly cited papers share similar content. These findings indicate a threshold where generalization shifts into memorization, with highly cited papers being nearly verbatim retained in the model.

View on arXiv PDF

Similar