SI AI DLMay 15

CitePrism: Human-in-the-Loop AI for Citation Auditing and Editorial Integrity

Gowrika Mahesh, Budanur Madappa Darshan Gowda, Kavana Gopladevarahalli Papegowda, Prajwal Basavaraj, Binh Vu, Swati Chandna, Mehrdad Jalali

arXiv:2605.160009.4

Predicted impact top 80% in SI · last 90 daysOriginality Synthesis-oriented

AI Analysis

For editors and reviewers, CitePrism offers a pilot-stage tool to assist in citation auditing, but its performance is preliminary and requires broader validation.

CitePrism is a hybrid decision-support framework for editorial citation auditing that combines LLM reasoning, embedding similarity, metadata verification, and human review. In a case study with 104 references, it achieved Cohen's kappa = 0.429 and flagged all irrelevant citations at a threshold of tau=17, but with false positives.

Editors and reviewers are expected to ensure that manuscripts cite relevant, accurate, current, and ethically appropriate literature, yet manuscript-level citation auditing remains largely manual, fragmented, and difficult to scale. Citation context, metadata quality, self-citation patterns, and bibliographic integrity all affect whether a reference appropriately supports a local claim. We present CitePrism, a transparent hybrid decision-support framework for editorial citation auditing that combines LLM-assisted contextual reasoning, embedding-based semantic similarity, metadata verification, integrity-oriented flags, and human-in-the-loop analyst review. CitePrism extracts citation neighborhoods, enriches reference metadata, computes fused relevance scores, surfaces metadata and self-citation review prompts, and supports configurable threshold-based triage. In a preliminary validation on a single case-study manuscript with 104 references from pavement engineering, agreement with human binary relevance labels reached Cohen's kappa = 0.429. At operating threshold tau = 17, CitePrism flagged all human-labeled irrelevant citations, while also producing false positives requiring analyst review. These results suggest that CitePrism may support conservative editorial screening and citation-quality triage, but they do not establish general editorial performance. CitePrism is intended as pilot-stage decision support, not as an autonomous misconduct detector or automated editorial decision system. Broader validation across manuscripts, domains, annotators, baselines, and deployment settings is required before operational use.

View on arXiv PDF

Similar