CLFeb 20, 2024

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao

Berkeley

arXiv:2402.12948v319.941 citationsh-index: 24Has CodeACL

Originality Incremental advance

AI Analysis

This addresses the issue of poor user experience and reduced diversity in watermarked text generation for applications like content moderation, but it is incremental as it builds on existing watermarking techniques.

The paper tackled the problem of low generation diversity in GumbelMax-trick-based watermarking for large language models, which causes identical outputs for the same prompt, and proposed the GumbelSoft watermark, achieving AUROC scores that outperform alternative variants by 0.1 to 0.3 and other methods by at least 0.1.

Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1.

View on arXiv PDF Code

Similar