CLJul 30, 2023

An Unforgeable Publicly Verifiable Watermark for Large Language Models

Tsinghua
arXiv:2307.16230v769 citationsh-index: 26Has Code
Originality Highly original
AI Analysis

This addresses the problem of secure and public verification of AI-generated text for applications like preventing fake news and copyright infringement, representing a novel method for a known bottleneck.

The paper tackles the security vulnerability in existing text watermarking algorithms for large language models, where detection requires a secret key, by proposing an unforgeable publicly verifiable watermark algorithm (UPV) that uses separate neural networks for generation and detection, achieving high detection accuracy and computational efficiency.

Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Our code is available at \href{https://github.com/THU-BPM/unforgeable_watermark}{https://github.com/THU-BPM/unforgeable\_watermark}. Additionally, our algorithm could also be accessed through MarkLLM \citep{pan2024markllm} \footnote{https://github.com/THU-BPM/MarkLLM}.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes