LG CL CROct 27, 2023

Publicly-Detectable Watermarking for Language Models

Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Mingyuan Wang

arXiv:2310.18491v428.391 citationsh-index: 46Has Code

Originality Incremental advance

AI Analysis

This addresses the need for verifiable attribution of AI-generated text, though it is incremental by building on prior watermarking methods with a focus on public detectability.

The authors tackled the problem of watermarking language model outputs by developing a publicly-detectable scheme that embeds cryptographic signatures without secret information, achieving unforgeable and distortion-free text with error-correction to handle low entropy.

We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.

View on arXiv PDF Code

Similar