Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection
This work addresses the security of forensic detectors for AI-generated images, which is crucial for preventing malicious use, but it is incremental as it builds on existing methods without introducing new paradigms.
The paper investigated the adversarial robustness of AI-generated image detectors, specifically comparing CLIP-based methods with CNN-based ones, and found that while both are vulnerable to white-box attacks, attacks do not transfer easily between them, with differences observed in frequency-domain noise patterns.
In recent years, many forensic detectors have been proposed to detect AI-generated images and prevent their use for malicious purposes. Convolutional neural networks (CNNs) have long been the dominant architecture in this field and have been the subject of intense study. However, recently proposed Transformer-based detectors have been shown to match or even outperform CNN-based detectors, especially in terms of generalization. In this paper, we study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods. We study the robustness to different adversarial attacks under a variety of conditions and analyze both numerical results and frequency-domain patterns. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. However, attacks do not easily transfer between CNN-based and CLIP-based methods. This is also confirmed by the different distribution of the adversarial noise patterns in the frequency domain. Overall, this analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.