Decoupling Content and Expression: Two-Dimensional Detection of AI-Generated Text
This addresses the need for systematic detection of AI participation in texts, which is critical due to the wide usage of LLMs, and represents a novel approach rather than an incremental improvement.
The paper tackles the problem of detecting AI-generated text by proposing a hierarchical framework (HART) and a novel 2D detection method that decouples content and language expression, achieving AUROC improvements from 0.705 to 0.849 for level-2 detection and from 0.807 to 0.886 for RAID.
The wide usage of LLMs raises critical requirements on detecting AI participation in texts. Existing studies investigate these detections in scattered contexts, leaving a systematic and unified approach unexplored. In this paper, we present HART, a hierarchical framework of AI risk levels, each corresponding to a detection task. To address these tasks, we propose a novel 2D Detection Method, decoupling a text into content and language expression. Our findings show that content is resistant to surface-level changes, which can serve as a key feature for detection. Experiments demonstrate that 2D method significantly outperforms existing detectors, achieving an AUROC improvement from 0.705 to 0.849 for level-2 detection and from 0.807 to 0.886 for RAID. We release our data and code at https://github.com/baoguangsheng/truth-mirror.