CLLGJun 30, 2023

Provable Robust Watermarking for AI-Generated Text

BerkeleyCMU
arXiv:2306.17439v2330 citationsh-index: 60Has Code
AI Analysis

This work addresses safety and responsible use issues for LLM users by providing a robust watermarking method, though it is incremental as it extends an existing approach.

The paper tackles the problem of watermarking AI-generated text to address safety challenges in large language models (LLMs), proposing Unigram-Watermark, which achieves superior detection accuracy and comparable generation quality in perplexity on three LLMs and two datasets.

We study the problem of watermarking large language models (LLMs) generated text -- one of the most promising approaches for addressing the safety challenges of LLM usage. In this paper, we propose a rigorous theoretical framework to quantify the effectiveness and robustness of LLM watermarks. We propose a robust and high-quality watermark method, Unigram-Watermark, by extending an existing approach with a simplified fixed grouping strategy. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing. Experiments on three varying LLMs and two datasets verify that our Unigram-Watermark achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs. Code is available at https://github.com/XuandongZhao/Unigram-Watermark.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes