CR CLMay 22, 2024

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

arXiv:2405.13517v110.76 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the problem of misuse and tracing of LLMs for users and developers, offering an incremental improvement by enhancing existing watermarking techniques.

The paper tackles the trade-off in watermarking large language models among imperceptibility, efficacy, and robustness by introducing WaterPool, a key module that integrates with existing methods to improve performance, achieving gains such as +12.73% for KGW and +20.27% for EXP.

With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensuring a high detection rate (efficacy), even when the text is partially altered (robustness). Despite many methods having been proposed, none have simultaneously achieved all three properties, revealing an inherent trade-off. This paper utilizes a key-centered scheme to unify existing watermarking techniques by decomposing a watermark into two distinct modules: a key module and a mark module. Through this decomposition, we demonstrate for the first time that the key module significantly contributes to the trade-off issues observed in prior methods. Specifically, this reflects the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection. To this end, we introduce \textbf{WaterPool}, a simple yet effective key module that preserves a complete key sampling space required by imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate with most watermarks, acting as a plug-in. Our experiments with three well-known watermarking techniques show that WaterPool significantly enhances their performance, achieving near-optimal imperceptibility and markedly improving efficacy and robustness (+12.73\% for KGW, +20.27\% for EXP, +7.27\% for ITS).

View on arXiv PDF

Similar