CVMar 19

Transferable Multi-Bit Watermarking Across Frozen Diffusion Models via Latent Consistency Bridges

arXiv:2603.2030447.3h-index: 34
Predicted impact top 72% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the need for efficient and flexible watermarking in diffusion models, offering a plug-and-play solution that is incremental over existing methods by combining advantages of sampling-based and fine-tuning-based approaches.

The paper tackled the problem of watermarking diffusion models for provenance and accountability, proposing DiffMark which enables single-pass multi-bit detection with a 45x speedup (16.4 ms vs. sampling-based methods) and cross-model transferability without per-model fine-tuning.

As diffusion models (DMs) enable photorealistic image generation at unprecedented scale, watermarking techniques have become essential for provenance establishment and accountability. Existing methods face challenges: sampling-based approaches operate on frozen models but require costly $N$-step Denoising Diffusion Implicit Models (DDIM) inversion (typically N=50) for zero-bit-only detection; fine-tuning-based methods achieve fast multi-bit extraction but couple the watermark to a specific model checkpoint, requiring retraining for each architecture. We propose DiffMark, a plug-and-play watermarking method that offers three key advantages over existing approaches: single-pass multi-bit detection, per-image key flexibility, and cross-model transferability. Rather than encoding the watermark into the initial noise vector, DiffMark injects a persistent learned perturbation $δ$ at every denoising step of a completely frozen DM. The watermark signal accumulates in the final denoised latent $z_0$ and is recovered in a single forward pass. The central challenge of backpropagating gradients through a frozen UNet without traversing the full denoising chain is addressed by employing Latent Consistency Models (LCM) as a differentiable training bridge. This reduces the number of gradient steps from 50 DDIM to 4 LCM and enables a single-pass detection at 16.4 ms, a 45x speedup over sampling-based methods. Moreover, by this design, the encoder learns to map any runtime secret to a unique perturbation at inference time, providing genuine per-image key flexibility and transferability to unseen diffusion-based architectures without per-model fine-tuning. Although achieving these advantages, DiffMark also maintains competitive watermark robustness against distortion, regeneration, and adversarial attacks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes