CVApr 1, 2025

Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-Generated Image Detection

arXiv:2504.00463v21 citationsh-index: 16
AI Analysis

This work addresses the challenge of detecting AI-generated images across diverse forgery types, offering a domain-specific solution that is incremental in nature.

The paper tackles the problem of improving generalization in AI-generated image detection by leveraging multiple types of low-level information, such as noise patterns, and proposes the Adaptive Low-level Experts Injection (ALEI) framework, which achieves state-of-the-art results on multiple datasets with unseen GAN and Diffusion methods after fine-tuning on only four categories of ProGAN data.

Existing state-of-the-art AI-Generated image detection methods mostly consider extracting low-level information from RGB images to help improve the generalization of AI-Generated image detection, such as noise patterns. However, these methods often consider only a single type of low-level information, which may lead to suboptimal generalization. Through empirical analysis, we have discovered a key insight: different low-level information often exhibits generalization capabilities for different types of forgeries. Furthermore, we found that simple fusion strategies are insufficient to leverage the detection advantages of each low-level and high-level information for various forgery types. Therefore, we propose the Adaptive Low-level Experts Injection (ALEI) framework. Our approach introduces Lora Experts, enabling the backbone network, which is trained with high-level semantic RGB images, to accept and learn knowledge from different low-level information. We utilize a cross-attention method to adaptively fuse these features at intermediate layers. To prevent the backbone network from losing the modeling capabilities of different low-level features during the later stages of modeling, we developed a Low-level Information Adapter that interacts with the features extracted by the backbone network. Finally, we propose Dynamic Feature Selection, which dynamically selects the most suitable features for detecting the current image to maximize generalization detection capability. Extensive experiments demonstrate that our method, finetuned on only four categories of mainstream ProGAN data, performs excellently and achieves state-of-the-art results on multiple datasets containing unseen GAN and Diffusion methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes