CVAIDec 16, 2021

MVSS-Net: Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection

arXiv:2112.08935v3316 citations
Originality Incremental advance
AI Analysis

This work addresses the need for reliable image manipulation detection in media forensics, offering a method that improves generalization over existing deep learning approaches, though it is incremental in nature.

The paper tackles the problem of detecting image manipulations like copy-move and splicing by proposing MVSS-Net and MVSS-Net++, which use multi-view feature learning and multi-scale supervision to improve generalization and reduce false alarms on authentic images. The enhanced version MVSS-Net++ achieves the best performance in within-dataset and cross-dataset tests, showing better robustness against JPEG compression, Gaussian blur, and screenshot re-capturing.

As manipulating images by copy-move, splicing and/or inpainting may lead to misinterpretation of the visual content, detecting these sorts of manipulations is crucial for media forensics. Given the variety of possible attacks on the content, devising a generic method is nontrivial. Current deep learning based methods are promising when training and test data are well aligned, but perform poorly on independent tests. Moreover, due to the absence of authentic test images, their image-level detection specificity is in doubt. The key question is how to design and train a deep neural network capable of learning generalizable features sensitive to manipulations in novel data, whilst specific to prevent false alarms on the authentic. We propose multi-view feature learning to jointly exploit tampering boundary artifacts and the noise view of the input image. As both clues are meant to be semantic-agnostic, the learned features are thus generalizable. For effectively learning from authentic images, we train with multi-scale (pixel / edge / image) supervision. We term the new network MVSS-Net and its enhanced version MVSS-Net++. Experiments are conducted in both within-dataset and cross-dataset scenarios, showing that MVSS-Net++ performs the best, and exhibits better robustness against JPEG compression, Gaussian blur and screenshot based image re-capturing.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes