LG CL CRApr 16, 2025

Watermarking Needs Input Repetition Masking

David Khachaturov, Robert Mullins, Ilia Shumailov, Sumanth Dathathri

DeepMind

arXiv:2504.12229v17.11 citationsh-index: 26

Originality Incremental advance

AI Analysis

This work addresses the reliability of watermarking for LLM-generated text, highlighting a vulnerability that could lead to false positives, making it incremental by exposing a practical issue in existing methods.

The paper investigates conversational adaptation, termed mimicry, showing that both humans and unwatermarked LLMs can unintentionally mimic properties of watermarked LLM-generated text, challenging current assumptions and suggesting the need for lower false positives and longer sequences in watermarking mechanisms.

Recent advancements in Large Language Models (LLMs) raised concerns over potential misuse, such as for spreading misinformation. In response two counter measures emerged: machine learning-based detectors that predict if text is synthetic, and LLM watermarking, which subtly marks generated text for identification and attribution. Meanwhile, humans are known to adjust language to their conversational partners both syntactically and lexically. By implication, it is possible that humans or unwatermarked LLMs could unintentionally mimic properties of LLM generated text, making counter measures unreliable. In this work we investigate the extent to which such conversational adaptation happens. We call the concept $\textit{mimicry}$ and demonstrate that both humans and LLMs end up mimicking, including the watermarking signal even in seemingly improbable settings. This challenges current academic assumptions and suggests that for long-term watermarking to be reliable, the likelihood of false positives needs to be significantly lower, while longer word sequences should be used for seeding watermarking mechanisms.

View on arXiv PDF

Similar