LGCLCROct 27, 2023

Publicly-Detectable Watermarking for Language Models

arXiv:2310.18491v487 citationsh-index: 46
Originality Incremental advance
AI Analysis

This addresses the need for verifiable attribution of AI-generated text, though it is incremental by building on prior watermarking methods with a focus on public detectability.

The authors tackled the problem of watermarking language model outputs by developing a publicly-detectable scheme that embeds cryptographic signatures without secret information, achieving unforgeable and distortion-free text with error-correction to handle low entropy.

We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes