CVAIFeb 21

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

arXiv:2602.18880v1
Originality Incremental advance
AI Analysis

This work addresses media verification and digital forensics challenges for applications requiring trust in visual content, representing an incremental improvement with a novel hybrid approach.

The paper tackles the problem of image forgery detection and localization by addressing the limitations of existing methods that rely too much on semantic content and lack interpretability, resulting in a framework that outperforms state-of-the-art methods in performance and interpretability across spatial and frequency domains.

Advances in image tampering techniques, particularly generative models, pose significant challenges to media verification, digital forensics, and public trust. Existing image forgery detection and localization (IFDL) methods suffer from two key limitations: over-reliance on semantic content while neglecting textural cues, and limited interpretability of subtle low-level tampering traces. To address these issues, we propose FOCA, a multimodal large language model-based framework that integrates discriminative features from both the RGB spatial and frequency domains via a cross-attention fusion module. This design enables accurate forgery detection and localization while providing explicit, human-interpretable cross-domain explanations. We further introduce FSE-Set, a large-scale dataset with diverse authentic and tampered images, pixel-level masks, and dual-domain annotations. Extensive experiments show that FOCA outperforms state-of-the-art methods in detection performance and interpretability across both spatial and frequency domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes