Watermarks for Language Models via Probabilistic Automata
This work addresses watermarking for language models, offering improved diversity and efficiency, but it appears incremental as it builds on prior schemes to enhance specific bottlenecks.
The paper tackles the problem of limited generation diversity and high detection overhead in language model watermarking by introducing a new class of schemes based on probabilistic automata, achieving exponential generation diversity and computational efficiency with validated superior robustness and efficiency on models like LLaMA-3B and Mistral-7B.
A recent watermarking scheme for language models achieves distortion-free embedding and robustness to edit-distance attacks. However, it suffers from limited generation diversity and high detection overhead. In parallel, recent research has focused on undetectability, a property ensuring that watermarks remain difficult for adversaries to detect and spoof. In this work, we introduce a new class of watermarking schemes constructed through probabilistic automata. We present two instantiations: (i) a practical scheme with exponential generation diversity and computational efficiency, and (ii) a theoretical construction with formal undetectability guarantees under cryptographic assumptions. Extensive experiments on LLaMA-3B and Mistral-7B validate the superior performance of our scheme in terms of robustness and efficiency.