LGMMNEMLOct 28, 2024

Segmenting Watermarked Texts From Language Models

arXiv:2410.20670v15 citationsh-index: 20Has CodeNIPS
Originality Incremental advance
AI Analysis

This work addresses the challenge of tracing AI-generated content for verification and security, but it is incremental as it builds on existing watermarking and detection techniques.

The paper tackles the problem of detecting and segmenting watermarked text from language models, even when modified by users, by developing a statistical method based on randomization tests and change point detection, achieving accurate identification with controlled error rates and encouraging numerical results on texts from models like those using Google's C4 dataset.

Watermarking is a technique that involves embedding nearly unnoticeable statistical signals within generated content to help trace its source. This work focuses on a scenario where an untrusted third-party user sends prompts to a trusted language model (LLM) provider, who then generates a text from their LLM with a watermark. This setup makes it possible for a detector to later identify the source of the text if the user publishes it. The user can modify the generated text by substitutions, insertions, or deletions. Our objective is to develop a statistical method to detect if a published text is LLM-generated from the perspective of a detector. We further propose a methodology to segment the published text into watermarked and non-watermarked sub-strings. The proposed approach is built upon randomization tests and change point detection techniques. We demonstrate that our method ensures Type I and Type II error control and can accurately identify watermarked sub-strings by finding the corresponding change point locations. To validate our technique, we apply it to texts generated by several language models with prompts extracted from Google's C4 dataset and obtain encouraging numerical results. We release all code publicly at https://github.com/doccstat/llm-watermark-cpd.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes