CVMay 21, 2025

Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection

Haotian Qin, Dongliang Chang, Yueying Gao, Bingyao Yu, Lei Chen, Zhanyu Ma

arXiv:2505.15217v110.24 citationsh-index: 20Has Code

Originality Incremental advance

AI Analysis

This addresses the generalization challenge in AI-generated image detection, which is crucial for content moderation and security applications, though it appears incremental over existing CLIP-based methods.

The paper tackles the problem of feature redundancy in CLIP-based AI-generated image detection by proposing a multimodal conditional information bottleneck network, achieving exceptional generalization performance on the GenImage dataset and latest generative models.

Although existing CLIP-based methods for detecting AI-generated images have achieved promising results, they are still limited by severe feature redundancy, which hinders their generalization ability. To address this issue, incorporating an information bottleneck network into the task presents a straightforward solution. However, relying solely on image-corresponding prompts results in suboptimal performance due to the inherent diversity of prompts. In this paper, we propose a multimodal conditional bottleneck network to reduce feature redundancy while enhancing the discriminative power of features extracted by CLIP, thereby improving the model's generalization ability. We begin with a semantic analysis experiment, where we observe that arbitrary text features exhibit lower cosine similarity with real image features than with fake image features in the CLIP feature space, a phenomenon we refer to as "bias". Therefore, we introduce InfoFD, a text-guided AI-generated image detection framework. InfoFD consists of two key components: the Text-Guided Conditional Information Bottleneck (TGCIB) and Dynamic Text Orthogonalization (DTO). TGCIB improves the generalizability of learned representations by conditioning on both text and class modalities. DTO dynamically updates weighted text features, preserving semantic information while leveraging the global "bias". Our model achieves exceptional generalization performance on the GenImage dataset and latest generative models. Our code is available at https://github.com/Ant0ny44/InfoFD.

View on arXiv PDF Code

Similar