CVMar 24, 2025

SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection

arXiv:2503.18812v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of identifying AI-generated images for security and verification purposes, representing an incremental improvement in detection methods.

The paper tackled the detection of AI-generated images by fine-tuning a Vision Transformer with data augmentation, achieving state-of-the-art performance that significantly outperformed competing methods on validation and test datasets.

The aim of this work is to explore the potential of pre-trained vision-language models, e.g. Vision Transformers (ViT), enhanced with advanced data augmentation strategies for the detection of AI-generated images. Our approach leverages a fine-tuned ViT model trained on the Defactify-4.0 dataset, which includes images generated by state-of-the-art models such as Stable Diffusion 2.1, Stable Diffusion XL, Stable Diffusion 3, DALL-E 3, and MidJourney. We employ perturbation techniques like flipping, rotation, Gaussian noise injection, and JPEG compression during training to improve model robustness and generalisation. The experimental results demonstrate that our ViT-based pipeline achieves state-of-the-art performance, significantly outperforming competing methods on both validation and test datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes