CV MM SD ASMay 1, 2024

Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol

Konstantinos Apostolidis, Jakob Abesser, Luca Cuccovillo, Vasileios Mezaris

arXiv:2405.00384v12.01 citationsh-index: 34Has CodeMAD@ICMR

Originality Incremental advance

AI Analysis

This work addresses content verification for multimedia applications, but it is incremental as it builds on existing classification baselines with a new protocol and dataset.

The paper tackles the problem of detecting discrepancies between audio and video in multimedia content by designing an audio-visual scene classifier and applying it separately to each modality to identify inconsistencies. It achieves state-of-the-art results in scene classification and promising outcomes in discrepancy detection, introducing a benchmark dataset and experimental protocol for further research.

This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications.

View on arXiv PDF Code

Similar