CVMay 28, 2025

Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective

arXiv:2505.22604v21 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the problem of robust detection of AI-generated images for security applications, offering a novel defense against adversarial attacks without retraining, though it is incremental as it builds on existing detectors.

The paper tackles the vulnerability of AI-generated image detectors to adversarial attacks by identifying that adversarial training causes performance collapse due to feature entanglement, and proposes TRIM, a training-free defense using information-theoretic measures that outperforms state-of-the-art defenses by up to 33.88% on specific datasets while maintaining original accuracy.

Rapid advances in Artificial Intelligence Generated Images (AIGI) have facilitated malicious use, such as forgery and misinformation. Therefore, numerous methods have been proposed to detect fake images. Although such detectors have been proven to be universally vulnerable to adversarial attacks, defenses in this field are scarce. In this paper, we first identify that adversarial training (AT), widely regarded as the most effective defense, suffers from performance collapse in AIGI detection. Through an information-theoretic lens, we further attribute the cause of collapse to feature entanglement, which disrupts the preservation of feature-label mutual information. Instead, standard detectors show clear feature separation. Motivated by this difference, we propose Training-free Robust Detection via Information-theoretic Measures (TRIM), the first training-free adversarial defense for AIGI detection. TRIM builds on standard detectors and quantifies feature shifts using prediction entropy and KL divergence. Extensive experiments across multiple datasets and attacks validate the superiority of our TRIM, e.g., outperforming the state-of-the-art defense by 33.88% (28.91%) on ProGAN (GenImage), while well maintaining original accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes