CLMMMar 15, 2024

Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs

arXiv:2403.10020v311 citationsh-index: 2NAACL
Originality Incremental advance
AI Analysis

This addresses a critical issue for users and developers of LLMs in ensuring text copyright protection, but it is incremental as it builds on existing attack methods.

The paper tackles the problem of watermark collision in logit-based watermarking for LLMs, where widespread use across tasks like paraphrasing leads to overlapping watermarks, and demonstrates that this poses a threat to all such algorithms, impacting downstream applications.

The proliferation of large language models (LLMs) in generating content raises concerns about text copyright. Watermarking methods, particularly logit-based approaches, embed imperceptible identifiers into text to address these challenges. However, the widespread usage of watermarking across diverse LLMs has led to an inevitable issue known as watermark collision during common tasks, such as paraphrasing or translation. In this paper, we introduce watermark collision as a novel and general philosophy for watermark attacks, aimed at enhancing attack performance on top of any other attacking methods. We also provide a comprehensive demonstration that watermark collision poses a threat to all logit-based watermark algorithms, impacting not only specific attack scenarios but also downstream applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes