Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images
This addresses the challenge of digital authenticity and deepfake detection, which is critical for security and media integrity, though it is an incremental improvement over existing detection methods.
The authors tackled the problem of detecting diffusion model-generated images by introducing a robust detection framework that integrates CLIP features with an MLP classifier, achieving state-of-the-art performance in distinguishing real from synthetic content.
Diffusion models (DMs) have revolutionized image generation, producing high-quality images with applications spanning various fields. However, their ability to create hyper-realistic images poses significant challenges in distinguishing between real and synthetic content, raising concerns about digital authenticity and potential misuse in creating deepfakes. This work introduces a robust detection framework that integrates image and text features extracted by CLIP model with a Multilayer Perceptron (MLP) classifier. We propose a novel loss that can improve the detector's robustness and handle imbalanced datasets. Additionally, we flatten the loss landscape during the model training to improve the detector's generalization capabilities. The effectiveness of our method, which outperforms traditional detection techniques, is demonstrated through extensive experiments, underscoring its potential to set a new state-of-the-art approach in DM-generated image detection. The code is available at https://github.com/Purdue-M2/Robust_DM_Generated_Image_Detection.