VPN: Video Provenance Network for Robust Content Attribution
This addresses the challenge of tracking video provenance for platforms and users, though it is incremental as it builds on existing contrastive learning and feature fusion techniques.
The paper tackles the problem of robust content attribution for videos shared online by developing VPN, a method that learns search embeddings invariant to common transformations like quality changes or edits, achieving high accuracy recall over a corpus of 100,000 videos.
We present VPN - a content attribution method for recovering provenance information from videos shared online. Platforms, and users, often transform video into different quality, codecs, sizes, shapes, etc. or slightly edit its content such as adding text or emoji, as they are redistributed online. We learn a robust search embedding for matching such video, invariant to these transformations, using full-length or truncated video queries. Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user. We use an inverted index to match temporal chunks of video using late-fusion to combine both visual and audio features. In both cases, features are extracted via a deep neural network trained using contrastive learning on a dataset of original and augmented video clips. We demonstrate high accuracy recall over a corpus of 100,000 videos.