UVL2: A Unified Framework for Video Tampering Localization
This addresses security risks from malicious video tampering, which can cause public misunderstanding and legal issues, but is incremental as it builds on existing detection frameworks.
The paper tackles the problem of detecting and localizing tampered videos, such as inpainting and splicing, by proposing a network that extracts generalized forgery traces, resulting in significantly outperforming existing state-of-the-art methods with demonstrated robustness.
With the advancement of deep learning-driven video editing technology, security risks have emerged. Malicious video tampering can lead to public misunderstanding, property losses, and legal disputes. Currently, detection methods are mostly limited to specific datasets, with limited detection performance for unknown forgeries, and lack of robustness for processed data. This paper proposes an effective video tampering localization network that significantly improves the detection performance of video inpainting and splicing by extracting more generalized features of forgery traces. Considering the inherent differences between tampered videos and original videos, such as edge artifacts, pixel distribution, texture features, and compress information, we have specifically designed four modules to independently extract these features. Furthermore, to seamlessly integrate these features, we employ a two-stage approach utilizing both a Convolutional Neural Network and a Vision Transformer, enabling us to learn these features in a local-to-global manner. Experimental results demonstrate that the method significantly outperforms the existing state-of-the-art methods and exhibits robustness.