94.1IVMar 13
Editing Away the Evidence: Diffusion-Based Image Manipulation and the Failure Modes of Robust WatermarkingQian Qi, Jiangyun Tang, Jim Lee et al.
Robust invisible watermarks are widely used to support copyright protection, content provenance, and accountability by embedding hidden signals designed to survive common post-processing operations. However, diffusion-based image editing introduces a fundamentally different class of transformations: it injects noise and reconstructs images through a powerful generative prior, often altering semantic content while preserving photorealism. In this paper, we provide a unified theoretical and empirical analysis showing that non-adversarial diffusion editing can unintentionally degrade or remove robust watermarks. We model diffusion editing as a stochastic transformation that progressively contracts off-manifold perturbations, causing the low-amplitude signals used by many watermarking schemes to decay. Our analysis derives bounds on watermark signal-to-noise ratio and mutual information along diffusion trajectories, yielding conditions under which reliable recovery becomes information-theoretically impossible. We further evaluate representative watermarking systems under a range of diffusion-based editing scenarios and strengths. The results indicate that even routine semantic edits can significantly reduce watermark recoverability. Finally, we discuss the implications for content provenance and outline principles for designing watermarking approaches that remain robust under generative image editing.
CVOct 7, 2025
Diffusion-Based Image Editing for Breaking Robust WatermarksYunyi Ni, Finn Carter, Ze Niu et al.
Robust invisible watermarking aims to embed hidden information into images such that the watermark can survive various image manipulations. However, the rise of powerful diffusion-based image generation and editing techniques poses a new threat to these watermarking schemes. In this paper, we present a theoretical study and method demonstrating that diffusion models can effectively break robust image watermarks that were designed to resist conventional perturbations. We show that a diffusion-driven ``image regeneration'' process can erase embedded watermarks while preserving perceptual image content. We further introduce a novel guided diffusion attack that explicitly targets the watermark signal during generation, significantly degrading watermark detectability. Theoretically, we prove that as an image undergoes sufficient diffusion-based transformation, the mutual information between the watermarked image and the embedded watermark payload vanishes, resulting in decoding failure. Experimentally, we evaluate our approach on multiple state-of-the-art watermarking schemes (including the deep learning-based methods StegaStamp, TrustMark, and VINE) and demonstrate near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Our findings highlight a fundamental vulnerability in current robust watermarking techniques against generative model-based attacks, underscoring the need for new watermarking strategies in the era of generative AI.
82.3MMMar 31
Editing on the Generative Manifold: A Theoretical and Empirical Study of General Diffusion-Based Image Editing Trade-offsYi Hu, Leying Yi, Emily Davis et al.
Diffusion-based editing has rapidly evolved from curated inpainting tools into general-purpose editors spanning text-guided instruction following, mask-localized edits, drag-based geometric manipulation, exemplar transfer, and training-free composition systems. Despite strong empirical progress, the field lacks a unified treatment of core desiderata that govern practical usability: controllability (how precisely and continuously the user can specify an edit), faithfulness to user intent (semantic alignment to instructions), semantic consistency (preservation of identity and non-target content), locality (containment of changes), and perceptual quality (artifact suppression and detail retention). This paper provides a theoretical and empirical analysis of general diffusion-based image editing, connecting diverse paradigms through a common view of editing as guided transport on a learned image manifold. We first formalize editing as an operator induced by a conditional reverse-time generative process and define task-agnostic metrics capturing instruction adherence, region preservation, semantic consistency, and stability under repeated edits. We then develop theory describing edit dynamics under (i) noise-injection and denoising transport, (ii) inversion-and-edit pipelines and the propagation of inversion errors, and (iii) locality constraints implemented via masked guidance or hard constraints. Under mild Lipschitz assumptions on the learned score or flow field, we derive bounds connecting guidance strength and inversion error to measurable deviations in non-target regions, and we characterize accumulation effects under iterative multi-turn editing. Empirically, we benchmark representative paradigms.
CVSep 15, 2025
Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and RobustnessZixuan Fu, Yan Ren, Finn Carter et al.
Diffusion models have achieved unprecedented success in image generation but pose increasing risks in terms of privacy, fairness, and security. A growing demand exists to \emph{erase} sensitive or harmful concepts (e.g., NSFW content, private individuals, artistic styles) from these models while preserving their overall generative capabilities. We introduce \textbf{SCORE} (Secure and Concept-Oriented Robust Erasure), a novel framework for robust concept removal in diffusion models. SCORE formulates concept erasure as an \emph{adversarial independence} problem, theoretically guaranteeing that the model's outputs become statistically independent of the erased concept. Unlike prior heuristic methods, SCORE minimizes the mutual information between a target concept and generated outputs, yielding provable erasure guarantees. We provide formal proofs establishing convergence properties and derive upper bounds on residual concept leakage. Empirically, we evaluate SCORE on Stable Diffusion and FLUX across four challenging benchmarks: object erasure, NSFW removal, celebrity face suppression, and artistic style unlearning. SCORE consistently outperforms state-of-the-art methods including EraseAnything, ANT, MACE, ESD, and UCE, achieving up to \textbf{12.5\%} higher erasure efficacy while maintaining comparable or superior image quality. By integrating adversarial optimization, trajectory consistency, and saliency-driven fine-tuning, SCORE sets a new standard for secure and robust concept erasure in diffusion models.
CVJul 16, 2025
FADE: Adversarial Concept Erasure in Flow ModelsZixuan Fu, Yan Ren, Finn Carter et al.
Diffusion models have demonstrated remarkable image generation capabilities, but also pose risks in privacy and fairness by memorizing sensitive concepts or perpetuating biases. We propose a novel \textbf{concept erasure} method for text-to-image diffusion models, designed to remove specified concepts (e.g., a private individual or a harmful stereotype) from the model's generative repertoire. Our method, termed \textbf{FADE} (Fair Adversarial Diffusion Erasure), combines a trajectory-aware fine-tuning strategy with an adversarial objective to ensure the concept is reliably removed while preserving overall model fidelity. Theoretically, we prove a formal guarantee that our approach minimizes the mutual information between the erased concept and the model's outputs, ensuring privacy and fairness. Empirically, we evaluate FADE on Stable Diffusion and FLUX, using benchmarks from prior work (e.g., object, celebrity, explicit content, and style erasure tasks from MACE). FADE achieves state-of-the-art concept removal performance, surpassing recent baselines like ESD, UCE, MACE, and ANT in terms of removal efficacy and image quality. Notably, FADE improves the harmonic mean of concept removal and fidelity by 5--10\% over the best prior method. We also conduct an ablation study to validate each component of FADE, confirming that our adversarial and trajectory-preserving objectives each contribute to its superior performance. Our work sets a new standard for safe and fair generative modeling by unlearning specified concepts without retraining from scratch.