CV CRJun 25, 2024

Video Inpainting Localization with Contrastive Learning

arXiv:2406.17628v16.55 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the forensic detection of malicious video manipulation for security applications, representing an incremental improvement over existing methods.

The paper tackles the problem of blindly identifying inpainted regions in manipulated videos, proposing ViLocal which achieves state-of-the-art performance in video inpainting localization.

Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal forensic features. To enhance the discriminative power, supervised contrastive learning is adopted to capture the local inconsistency of inpainted videos through attracting/repelling the positive/negative pristine and forged pixel pairs. A pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with a specialized two-stage training strategy. To prepare enough training samples, we build a video object segmentation dataset of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over state-of-the-arts. Code and dataset will be available at https://github.com/multimediaFor/ViLocal.

View on arXiv PDF Code

Similar