CVMar 30, 2024Code
Latent Watermark: Inject and Detect Watermarks in Latent Diffusion SpaceZheling Meng, Bo Peng, Jing Dong
Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression, while watermarks with superior robustness usually significantly damage image quality. This dilemma stems from the traditional paradigm where watermarks are injected and detected in pixel space, relying on pixel perturbation for watermark detection and resilience against attacks. In this paper, we highlight that an effective solution to the problem is to both inject and detect watermarks in the latent diffusion space, and propose Latent Watermark with a progressive training strategy. It weakens the direct connection between quality and robustness and thus alleviates their contradiction. We conduct evaluations on two datasets and against 10 watermark attacks. Six metrics measure the image quality and watermark robustness. Results show that compared to the recently proposed methods such as StableSignature, StegaStamp, RoSteALS, LaWa, TreeRing, and DiffuseTrace, LW not only surpasses them in terms of robustness but also offers superior image quality. Our code will be available at https://github.com/RichardSunnyMeng/LatentWatermark.
CVSep 26, 2024
Dark Miner: Defend against undesirable generation for text-to-image diffusion modelsZheling Meng, Bo Peng, Xiaochuan Jin et al.
Text-to-image diffusion models have been demonstrated with undesired generation due to unfiltered large-scale training data, such as sexual images and copyrights, necessitating the erasure of undesired concepts. Most existing methods focus on modifying the generation probabilities conditioned on the texts containing target concepts. However, they fail to guarantee the desired generation of texts unseen in the training phase, especially for the adversarial texts from malicious attacks. In this paper, we analyze the erasure task and point out that existing methods cannot guarantee the minimization of the total probabilities of undesired generation. To tackle this problem, we propose Dark Miner. It entails a recurring three-stage process that comprises mining, verifying, and circumventing. This method greedily mines embeddings with maximum generation probabilities of target concepts and more effectively reduces their generation. In the experiments, we evaluate its performance on the inappropriateness, object, and style concepts. Compared with the previous methods, our method achieves better erasure and defense results, especially under multiple adversarial attacks, while preserving the native generation capability of the models. Our code will be available on GitHub.
CVMar 17, 2024
Artifact Feature Purification for Cross-domain Detection of AI-generated ImagesZheling Meng, Bo Peng, Jing Dong et al.
In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, bring potential security risks to our society. Existing generated image detection methods suffer from performance drop when faced with out-of-domain generators and image scenes. To relieve this problem, we propose Artifact Purification Network (APN) to facilitate the artifact extraction from generated images through the explicit and implicit purification processes. For the explicit one, a suspicious frequency-band proposal method and a spatial feature decomposition method are proposed to extract artifact-related features. For the implicit one, a training strategy based on mutual information estimation is proposed to further purify the artifact-related features. Experiments show that for cross-generator detection, the average accuracy of APN is 5.6% ~ 16.4% higher than the previous 10 methods on GenImage dataset and 1.7% ~ 50.1% on DiffusionForensics dataset. For cross-scene detection, APN maintains its high performance. Via visualization analysis, we find that the proposed method extracts flexible forgery patterns and condenses the forgery information diluted in irrelevant features. We also find that the artifact features APN focuses on across generators and scenes are global and diverse. The code will be available on GitHub.
CVFeb 22, 2025
Concept Corrector: Erase concepts on the fly for text-to-image diffusion modelsZheling Meng, Bo Peng, Xiaochuan Jin et al.
Text-to-image diffusion models have demonstrated the underlying risk of generating various unwanted content, such as sexual elements. To address this issue, the task of concept erasure has been introduced, aiming to erase any undesired concepts that the models can generate. Previous methods, whether training-based or training-free, have primarily focused on the input side, i.e., texts. However, they often suffer from incomplete erasure due to limitations in the generalization from limited prompts to diverse image content. In this paper, motivated by the notion that concept erasure on the output side, i.e., generated images, may be more direct and effective, we propose Concept Corrector. It checks target concepts based on visual features provided by final generated images predicted at certain time steps. Further, it incorporates Concept Removal Attention to erase generated concept features. It overcomes the limitations of existing methods, which are either unable to remove the concept features that have been generated in images or rely on the assumption that the related concept words are contained in input prompts. In the whole pipeline, our method changes no model parameters and only requires a given target concept as well as the corresponding replacement content, which is easy to implement. To the best of our knowledge, this is the first erasure method based on intermediate-generated images, achieving the ability to erase concepts on the fly. The experiments on various concepts demonstrate its impressive erasure performance.