Watermarking Should Be Treated as a Monitoring Primitive
For researchers and practitioners in generative model safety, this work highlights a fundamental limitation of current watermarking evaluations, urging consideration of aggregation-based threats.
The paper argues that watermarking should be evaluated as a monitoring primitive, not just for per-sample robustness, showing that observers can aggregate signals to infer entity-level information, revealing a dual-use tension between attribution and monitoring.
Watermarking is widely proposed for provenance, attribution, and safety monitoring in generative models, yet is typically evaluated only under adversaries who attempt to evade detection or induce false positives at the level of individual samples. We argue that watermarking should be treated as a monitoring primitive, and that internal monitoring is unavoidable given per-entity attribution keys and messages, as well as detector access. We introduce an observer-based threat model in which observers can aggregate watermark signals across outputs to infer entity-level information, showing that even zero-bit watermarking enables attribution under multi-key settings. We further show that external monitoring can emerge over time from persistent, key-dependent statistical structure, although this depends on watermark design and may be mitigated by distribution-preserving or undetectable schemes. Our findings reveal a fundamental dual-use tension between attribution and monitoring, motivating evaluation of watermarking beyond per-sample robustness to account for aggregation and observer-based capabilities.