CVJan 4, 2023

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

arXiv:2301.01767v2132 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the issue of video forensics for detecting deepfakes or manipulated media, with an incremental approach that leverages audio-visual synchronization.

The paper tackles the problem of detecting manipulated videos by identifying inconsistencies between visual and audio signals, achieving strong performance on detecting manipulated speech videos using a self-supervised anomaly detection method trained solely on real, unlabeled data.

Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound. At test time, we then flag videos that the model assigns low probability. Despite being trained entirely on real videos, our model obtains strong performance on the task of detecting manipulated speech videos. Project site: https://cfeng16.github.io/audio-visual-forensics

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes