Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection
This addresses the problem of detecting deep forgeries from diffusion models for ensuring multimedia trustworthiness, representing an incremental advancement in domain-specific detection methods.
The paper tackles the challenge of detecting images generated by diffusion models, which are difficult for traditional forgery detection methods, by proposing the Trinity Detector that integrates text and pixel features with a multi-spectral attention mechanism. It achieves competitive performance across datasets and up to a 17.6% improvement in transferability for diffusion-generated images.
Artificial Intelligence Generated Content (AIGC) techniques, represented by text-to-image generation, have led to a malicious use of deep forgeries, raising concerns about the trustworthiness of multimedia content. Adapting traditional forgery detection methods to diffusion models proves challenging. Thus, this paper proposes a forgery detection method explicitly designed for diffusion models called Trinity Detector. Trinity Detector incorporates coarse-grained text features through a CLIP encoder, coherently integrating them with fine-grained artifacts in the pixel domain for comprehensive multimodal detection. To heighten sensitivity to diffusion-generated image features, a Multi-spectral Channel Attention Fusion Unit (MCAF) is designed, extracting spectral inconsistencies through adaptive fusion of diverse frequency bands and further integrating spatial co-occurrence of the two modalities. Extensive experimentation validates that our Trinity Detector method outperforms several state-of-the-art methods, our performance is competitive across all datasets and up to 17.6\% improvement in transferability in the diffusion datasets.