Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks
This addresses copyright compliance challenges for LLM developers and users, though it appears incremental as it builds on existing detection methods within a unified framework.
The researchers tackled the problem of detecting copyright risks in LLM outputs by developing Copyright Detective, an interactive forensic system that integrates multiple detection paradigms, enabling systematic auditing of verbatim memorization and paraphrase-level leakage.
We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM outputs. The system treats copyright infringement versus compliance as an evidence discovery process rather than a static classification task due to the complex nature of copyright law. It integrates multiple detection paradigms, including content recall testing, paraphrase-level similarity analysis, persuasive jailbreak probing, and unlearning verification, within a unified and extensible framework. Through interactive prompting, response collection, and iterative workflows, our system enables systematic auditing of verbatim memorization and paraphrase-level leakage, supporting responsible deployment and transparent evaluation of LLM copyright risks even with black-box access.