Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark
This work addresses the need for practical multibit watermarks in deployed LLMs, enabling encoding of complex payloads like user IDs or timestamps with improved reliability.
The paper introduces a binomial multibit LLM watermark that encodes every payload bit at every token position, achieving superior message accuracy and robustness, especially for large payloads and low-distortion regimes, outperforming 8 baselines on up to 64-bit payloads.
With LLM watermarking already being deployed commercially, practical applications increasingly require multibit watermarks that encode more complex payloads, such as user IDs or timestamps, into the generated text. In this work, we propose a fundamentally new approach for multibit watermarking: introducing binomial encoding to directly encode every bit of the payload at every token position. We complement our approach with a stateful encoder that during generation dynamically redirects encoding pressure toward underencoded bits. Our evaluation against 8 baselines on up to 64-bit payloads shows that our scheme achieves superior message accuracy and robustness, with the gap to baseline methods widening in more relevant settings (i.e., large payloads and low-distortion regimes). At the same time, we challenge prior works' evaluation metrics, highlighting their lack of practical insights, and introduce per-bit confidence scoring as a practically relevant metric for evaluating multibit LLM watermarks.