CVMMIVJan 5

DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization

arXiv:2601.01784v1h-index: 6
Originality Incremental advance
AI Analysis

This addresses the challenge of precisely localizing tampered segments in videos for applications like media forensics, though it appears incremental as it builds on existing methods by enhancing global anomaly capture.

The paper tackles the problem of temporal forgery localization in videos by proposing DDNet, a dual-stream graph learning and disentanglement framework, which outperforms state-of-the-art methods by approximately 9% in AP@0.95 and shows improved cross-domain robustness.

The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes