CVMay 13

Venus-DeFakerOne: Unified Fake Image Detection & Localization

arXiv:2605.140916.42 citations

Predicted impact top 67% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the fragmentation in fake image detection and localization by providing a unified model that works across diverse forgery types, which is important for security and forensic applications.

The paper proposes DeFakerOne, a unified foundation model for fake image detection and localization that integrates InternVL2 and SAM2, achieving state-of-the-art performance on 39 detection and 9 localization benchmarks with superior robustness against real-world perturbations and advanced generators like GPT-Image-2.

In recent years, the rapid evolution of generative AI has fundamentally reshaped the paradigm of image forgery, breaking the traditional boundaries between document editing, natural image manipulation, DeepFake generation, and full-image AIGC synthesis. Despite this shift toward unified forgery generation, existing research in Fake Image Detection and Localization (FIDL) remains fragmented. This creates a mismatch between increasingly unified forgery generation mechanisms and the domain-specific detection paradigm. Bridging this mismatch poses two key challenges for FIDL: understanding cross-domain artifacts transfer and interference, and building a high-capacity unified foundation model for joint detection and localization. To address these challenges, we propose DeFakerOne, a data-centric, unified FIDL foundation model integrating InternVL2 and SAM2. DeFakerOne enables simultaneous image-level detection and pixel-level forgery localization across diverse scenarios. Extensive experiments demonstrate that DeFakerOne achieves state-of-the-art performance, outperforming baselines on 39 forgery detection benchmarks and 9 localization benchmarks. Furthermore, the model exhibits superior robustness against real-world perturbations and state-of-the-art generators such as GPT-Image-2. Finally, we provide a systematic analysis of data scaling laws, cross-domain artifacts transfer-interference patterns, the necessity of fine-grained supervision, and the original resolution artifacts preservation, highlighting the design principles for scalable, robust, and unified FIDL.

View on arXiv PDF

Similar