CVAICLMar 8, 2025

Explainable Synthetic Image Detection through Diffusion Timestep Ensembling

arXiv:2503.06201v26 citationsh-index: 10Has Code
AI Analysis

This addresses security risks from misuse of AI-generated images, with incremental improvements in detection accuracy and explainability.

The paper tackles the problem of detecting synthetic images generated by diffusion models by proposing a method that uses features from multiple noised timesteps, achieving state-of-the-art detection accuracies of 98.91% on regular samples and 95.89% on challenging ones.

Recent advances in diffusion models have enabled the creation of deceptively real images, posing significant security risks when misused. In this study, we empirically show that different timesteps of DDIM inversion reveal varying subtle distinctions between synthetic and real images that are extractable for detection, in the forms of such as Fourier power spectrum high-frequency discrepancies and inter-pixel variance distributions. Based on these observations, we propose a novel synthetic image detection method that directly utilizes features of intermediately noised images by training an ensemble on multiple noised timesteps, circumventing conventional reconstruction-based strategies. To enhance human comprehension, we introduce a metric-grounded explanation generation and refinement module to identify and explain AI-generated flaws. Additionally, we construct the GenHard and GenExplain benchmarks to provide detection samples of greater difficulty and high-quality rationales for fake images. Extensive experiments show that our method achieves state-of-the-art performance with 98.91% and 95.89% detection accuracy on regular and challenging samples respectively, and demonstrates generalizability and robustness. Our code and datasets are available at https://github.com/Shadowlized/ESIDE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes