Combining Contrastive and Supervised Learning for Video Super-Resolution Detection
This work addresses video super-resolution detection for multimedia forensics, representing an incremental improvement over existing methods.
The authors tackled the problem of detecting upscaled videos, which is challenging due to various upscaling and compression algorithms, by proposing a method that combines contrastive and supervised learning. They demonstrated that their method effectively detects upscaling in compressed videos and outperforms state-of-the-art alternatives, with code and models made publicly available.
Upscaled video detection is a helpful tool in multimedia forensics, but it is a challenging task that involves various upscaling and compression algorithms. There are many resolution-enhancement methods, including interpolation and deep-learning-based super-resolution, and they leave unique traces. In this work, we propose a new upscaled-resolution-detection method based on learning of visual representations using contrastive and cross-entropy losses. To explain how the method detects videos, we systematically review the major components of our framework - in particular, we show that most data-augmentation approaches hinder the learning of the method. Through extensive experiments on various datasets, we demonstrate that our method effectively detects upscaling even in compressed videos and outperforms the state-of-the-art alternatives. The code and models are publicly available at https://github.com/msu-video-group/SRDM