The Provenance Problem: LLMs and the Breakdown of Citation Norms
This addresses a foundational problem for researchers and the scientific community, highlighting an incremental challenge to existing norms.
The paper tackles the issue of generative AI in scientific writing leading to uncredited use of ideas from sources, which it terms the 'provenance problem', and argues that this creates a novel category of attributional harm that current ethical frameworks fail to address, threatening scholarly integrity and fairness.
The increasing use of generative AI in scientific writing raises urgent questions about attribution and intellectual credit. When a researcher employs ChatGPT to draft a manuscript, the resulting text may echo ideas from sources the author has never encountered. If an AI system reproduces insights from, for example, an obscure 1975 paper without citation, does this constitute plagiarism? We argue that such cases exemplify the 'provenance problem': a systematic breakdown in the chain of scholarly credit. Unlike conventional plagiarism, this phenomenon does not involve intent to deceive (researchers may disclose AI use and act in good faith) yet still benefit from the uncredited intellectual contributions of others. This dynamic creates a novel category of attributional harm that current ethical and professional frameworks fail to address. As generative AI becomes embedded across disciplines, the risk that significant ideas will circulate without recognition threatens both the reputational economy of science and the demands of epistemic justice. This Perspective analyzes how AI challenges established norms of authorship, introduces conceptual tools for understanding the provenance problem, and proposes strategies to preserve integrity and fairness in scholarly communication.