Learning Rich Features for Image Manipulation Detection
This work addresses the problem of detecting manipulated images, which is crucial for media forensics and security applications, representing an incremental improvement over existing methods.
The paper tackles image manipulation detection by proposing a two-stream Faster R-CNN network that fuses RGB and noise features to identify tampered regions, achieving state-of-the-art performance on four standard datasets with robustness to resizing and compression.
Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned. We propose a two-stream Faster R-CNN network and train it endto- end to detect the tampered regions given a manipulated image. One of the two streams is an RGB stream whose purpose is to extract features from the RGB image input to find tampering artifacts like strong contrast difference, unnatural tampered boundaries, and so on. The other is a noise stream that leverages the noise features extracted from a steganalysis rich model filter layer to discover the noise inconsistency between authentic and tampered regions. We then fuse features from the two streams through a bilinear pooling layer to further incorporate spatial co-occurrence of these two modalities. Experiments on four standard image manipulation datasets demonstrate that our two-stream framework outperforms each individual stream, and also achieves state-of-the-art performance compared to alternative methods with robustness to resizing and compression.