CVLGApr 12, 2024

Detecting AI-Generated Images via CLIP

arXiv:2404.08788v19 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the societal need for accessible tools to mitigate the negative effects of AI-generated images, though it is incremental as it applies an existing method to a new task.

The paper tackles the problem of detecting AI-generated images by fine-tuning the CLIP architecture on real and AI-generated images, achieving performance comparable to or better than specialized models while requiring no architectural changes and fewer GPU resources.

As AI-generated image (AIGI) methods become more powerful and accessible, it has become a critical task to determine if an image is real or AI-generated. Because AIGI lack the signatures of photographs and have their own unique patterns, new models are needed to determine if an image is AI-generated. In this paper, we investigate the ability of the Contrastive Language-Image Pre-training (CLIP) architecture, pre-trained on massive internet-scale data sets, to perform this differentiation. We fine-tune CLIP on real images and AIGI from several generative models, enabling CLIP to determine if an image is AI-generated and, if so, determine what generation method was used to create it. We show that the fine-tuned CLIP architecture is able to differentiate AIGI as well or better than models whose architecture is specifically designed to detect AIGI. Our method will significantly increase access to AIGI-detecting tools and reduce the negative effects of AIGI on society, as our CLIP fine-tuning procedures require no architecture changes from publicly available model repositories and consume significantly less GPU resources than other AIGI detection models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes