A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering
For practitioners needing reliable detection of LLM-generated text, this work provides a way to add statistical guarantees to existing rewrite-based detectors.
The paper introduces a distribution-free framework that converts rewrite-based LLM-generated text detectors into ones with finite-sample false discovery rate (FDR) guarantees, without retraining. It demonstrates reliable FDR control with meaningful detection power across three detection models, 19 domains, and four LLMs.
We propose a distribution-free statistical framework that converts arbitrary rewrite-based detectors into detectors with finite-sample FDR guarantees without retraining. Our key observation is that rewrite-based detection implicitly constructs knockoff samples, enabling LLM-generated text detection to be formulated as a multiple hypothesis testing problem with knockoff structure. This perspective separates the design of detection statistics from the control of false discoveries, allowing existing rewrite detectors to inherit finite-sample false discovery rate (FDR) guarantees through a simple calibration procedure. We demonstrate reliable FDR control with meaningful detection power across three detection models, 19 domains, and four LLMs.