Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation
This work exposes a fundamental security tradeoff in watermarking for AI-generated content, which is crucial for developers and users relying on provenance detection.
The paper tackled the problem that robust watermarks for AI-generated content leak exploitable patterns due to redundancy, and it proposed an attack framework that extracts these patterns using multi-channel feature learning, achieving a 60% success rate gain in detection evasion and 51% improvement in forgery accuracy compared to state-of-the-art methods.
Watermarking plays a key role in the provenance and detection of AI-generated content. While existing methods prioritize robustness against real-world distortions (e.g., JPEG compression and noise addition), we reveal a fundamental tradeoff: such robust watermarks inherently improve the redundancy of detectable patterns encoded into images, creating exploitable information leakage. To leverage this, we propose an attack framework that extracts leakage of watermark patterns through multi-channel feature learning using a pre-trained vision model. Unlike prior works requiring massive data or detector access, our method achieves both forgery and detection evasion with a single watermarked image. Extensive experiments demonstrate that our method achieves a 60\% success rate gain in detection evasion and 51\% improvement in forgery accuracy compared to state-of-the-art methods while maintaining visual fidelity. Our work exposes the robustness-stealthiness paradox: current "robust" watermarks sacrifice security for distortion resistance, providing insights for future watermark design.