CVAIJan 2

A Comprehensive Dataset for Human vs. AI Generated Image Detection

arXiv:2601.00553v12 citationsh-index: 16Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the urgent need for tools to distinguish AI-generated images from real ones, but it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of detecting AI-generated images to combat misinformation by releasing MS COCOAI, a dataset with 96,000 real and synthetic datapoints from five generators, and proposes two classification tasks.

Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the spread of misleading content, false information, and manipulated media. As generated images become harder to distinguish from photographs, detecting them has become an urgent priority. To combat this challenge, We release MS COCOAI, a novel dataset for AI generated image detection consisting of 96000 real and synthetic datapoints, built using the MS COCO dataset. To generate synthetic images, we use five generators: Stable Diffusion 3, Stable Diffusion 2.1, SDXL, DALL-E 3, and MidJourney v6. Based on the dataset, we propose two tasks: (1) classifying images as real or generated, and (2) identifying which model produced a given synthetic image. The dataset is available at https://huggingface.co/datasets/Rajarshi-Roy-research/Defactify_Image_Dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes