Jin Luo

CRFeb 6Code

GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models

Zuyao Xu, Yuqi Qiu, Lu Sun et al.

Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensified: LLMs are increasingly used for academic writing, yet their tendency to fabricate citations (``ghost citations'') poses a systemic threat to citation validity. To quantify this threat and inform mitigation, we develop CiteVerifier, an open-source framework for large-scale citation verification, and conduct the first comprehensive study of citation validity in the LLM era through three experiments built on it. We benchmark 13 state-of-the-art LLMs on citation generation across 40 research domains, finding that all models hallucinate citations at rates from 14.23\% to 94.93\%, with significant variation across research domains. Moreover, we analyze 2.2 million citations from 56,381 papers published at top-tier AI/ML and Security venues (2020--2025), confirming that 1.07\% of papers contain invalid or fabricated citations (604 papers), with an 80.9\% increase in 2025 alone. Furthermore, we survey 97 researchers and analyze 94 valid responses after removing 3 conflicting samples, revealing a critical ``verification gap'': 41.5\% of researchers copy-paste BibTeX without checking and 44.4\% choose no-action responses when encountering suspicious references; meanwhile, 76.7\% of reviewers do not thoroughly check references and 80.0\% never suspect fake citations. Our findings reveal an accelerating crisis where unreliable AI tools, combined with inadequate human verification by researchers and insufficient peer review scrutiny, enable fabricated citations to contaminate the scientific record. We propose interventions for researchers, venues, and tool developers to protect citation integrity.

4.7ARApr 4

Efficient Solving for Dynamic Data Structure Constraint Satisfaction Problem

Nanbing Li, Weijie Peng, Jin Luo et al.

Functional verification plays a central role in ensuring the correctness of modern integrated circuit designs, where constrained-random verification is widely adopted to generate diverse stimuli under high-level constraints. In industrial verification environments, constraint solving increasingly involves dynamic data structures whose shape and content are determined at runtime, causing the sets of variables and constraint instances to evolve across solver invocations, which in turn leads to substantial overhead when nested and high-dimensional structures repeatedly expand across solves. We formalize this class of problems as the Dynamic Data Structure Constraint Satisfaction Problem (D2SCSP),which captures the interaction between dynamic data structure expansion and constraint evaluation. We propose a dependency-guided problem partitioning framework combined with an incremental encoding and constraint activation mechanism, enabling reuse of solver state and encodings across multiple solves. The framework is integrated into an industrial SystemVerilog verification flow and implemented in the commercial simulator VeriSim. Experimental results on industrial benchmarks demonstrate significant performance improvements, achieving an average speedup of 24.80x over a baseline and 1.72x over a state-of-the-art commercial simulator, highlighting the practicality of the approach for real-world verification workflows.

Jin Luo

2 Papers