How Good is Post-Hoc Watermarking With Language Model Rephrasing?
This work addresses the need for protecting copyrighted documents and detecting their use in AI training or retrieval-augmented generation, but it is incremental as it builds on existing generation-time watermarking methods.
The paper tackled the problem of traceability for AI-generated content by exploring post-hoc watermarking, where an LLM rewrites existing text with embedded statistical signals, and found that strategies like Gumbel-max with beam search achieve strong detectability and semantic fidelity on open-ended text, though they struggle with verifiable text like code.
Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.