A Unified Framework for LLM Watermarks

Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev

arXiv:2602.06754v15.31 citationsh-index: 64

Originality Incremental advance

AI Analysis

This provides a unified framework for researchers and practitioners in AI watermarking, though it is incremental as it builds on existing methods.

The authors tackled the lack of a general formulation for LLM watermarking by showing that most existing schemes can be derived from a principled constrained optimization problem, which unifies methods and reveals trade-offs like quality-diversity-power, and validated this with experiments showing derived schemes maximize detection power.

LLM watermarks allow tracing AI-generated texts by inserting a detectable signal into their generated content. Recent works have proposed a wide range of watermarking algorithms, each with distinct designs, usually built using a bottom-up approach. Crucially, there is no general and principled formulation for LLM watermarking. In this work, we show that most existing and widely used watermarking schemes can in fact be derived from a principled constrained optimization problem. Our formulation unifies existing watermarking methods and explicitly reveals the constraints that each method optimizes. In particular, it highlights an understudied quality-diversity-power trade-off. At the same time, our framework also provides a principled approach for designing novel watermarking schemes tailored to specific requirements. For instance, it allows us to directly use perplexity as a proxy for quality, and derive new schemes that are optimal with respect to this constraint. Our experimental evaluation validates our framework: watermarking schemes derived from a given constraint consistently maximize detection power with respect to that constraint.

View on arXiv PDF

Similar