Segmenting Human-LLM Co-authored Text via Change Point Detection

Mengchu Li, Jin Zhu, Jinglai Li, Chengchun Shi

arXiv:2605.037237.3

Predicted impact top 52% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners needing to localize LLM-generated segments in mixed text, this provides a principled segmentation approach, though it is an incremental adaptation of existing methods.

The paper addresses the problem of segmenting human-LLM co-authored text into human-written and LLM-generated pieces. By adapting change point detection methods, they achieve strong performance against existing baselines, with minimax optimality guarantees.

The rise of large language models (LLMs) has created an urgent need to distinguish between human-written and LLM-generated text to ensure authenticity and societal trust. Existing detectors typically provide a binary classification for an entire passage; however, this is insufficient for human--LLM co-authored text, where the objective is to localize specific segments authored by humans or LLMs. To bridge this gap, we propose algorithms to segment text into human- and LLM-authored pieces. Our key observation is that such a segmentation task is conceptually similar to classical change point detection in time-series analysis. Leveraging this analogy, we adapt change point detection to LLM-generated text detection, develop a weighted algorithm and a generalized algorithm to accommodate heterogeneous detection score variability, and establish the minimax optimality of our procedure. Empirically, we demonstrate the strong performance of our approach against a wide range of existing baselines.

View on arXiv PDF

Similar