ML CL CR LG MENov 17, 2024

Debiasing Watermarks for Large Language Models via Maximal Coupling

Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie J. Su, Ruixun Zhang

arXiv:2411.11203v219.017 citationsh-index: 7Has CodeJ Am Stat Assoc

Originality Incremental advance

AI Analysis

This provides a solution for maintaining digital communication integrity by watermarking language models, though it is incremental as it builds on existing watermarking techniques.

The paper tackles the problem of watermarking language models to distinguish human from machine-generated text by introducing a green/red list approach with maximal coupling for bias correction, resulting in improved text quality and high detectability while being resilient to modifications.

Watermarking language models is essential for distinguishing between human and machine-generated text and thus maintaining the integrity and trustworthiness of digital communication. We present a novel green/red list watermarking approach that partitions the token set into ``green'' and ``red'' lists, subtly increasing the generation probability for green tokens. To correct token distribution bias, our method employs maximal coupling, using a uniform coin flip to decide whether to apply bias correction, with the result embedded as a pseudorandom watermark signal. Theoretical analysis confirms this approach's unbiased nature and robust detection capabilities. Experimental results show that it outperforms prior techniques by preserving text quality while maintaining high detectability, and it demonstrates resilience to targeted modifications aimed at improving text quality. This research provides a promising watermarking solution for language models, balancing effective detection with minimal impact on text quality.

View on arXiv PDF Code

Similar