Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
This work addresses a practical bottleneck for deploying watermarking in large language models by enabling efficient inference without compromising provenance tracing, though it is incremental as it builds on existing watermarking schemes.
The paper tackled the trade-off between watermark strength and speculative sampling efficiency in language models, showing it is not absolute and introducing a mechanism that maintains efficiency while maximizing watermark strength, with experiments demonstrating improved detectability without sacrificing efficiency.
Watermarking is a principled approach for tracing the provenance of large language model (LLM) outputs, but its deployment in practice is hindered by inference inefficiency. Speculative sampling accelerates inference, with efficiency improving as the acceptance rate between draft and target models increases. Yet recent work reveals a fundamental trade-off: higher watermark strength reduces acceptance, preventing their simultaneous achievement. We revisit this trade-off and show it is not absolute. We introduce a quantitative measure of watermark strength that governs statistical detectability and is maximized when tokens are deterministic functions of pseudorandom numbers. Using this measure, we fully characterize the trade-off as a constrained optimization problem and derive explicit Pareto curves for two existing watermarking schemes. Finally, we introduce a principled mechanism that injects pseudorandomness into draft-token acceptance, ensuring maximal watermark strength while maintaining speculative sampling efficiency. Experiments further show that this approach improves detectability without sacrificing efficiency. Our findings uncover a principle that unites speculative sampling and watermarking, paving the way for their efficient and practical deployment.