CVJul 4, 2022

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

arXiv:2207.01241v17 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses video structuring analysis for applications like media organization, but it appears incremental as it builds on existing multimodal and sequential methods.

The paper tackles the joint learning problem of scene segmentation and classification in videos by unifying them into a single task of predicting links between adjacent shots, proposing the OS-MSL framework with a DiffCorrNet module. Results show effectiveness against strong baselines on a new large-scale dataset and MovieScenes, though no concrete numbers are provided.

Scene segmentation and classification (SSC) serve as a critical step towards the field of video structuring analysis. Intuitively, jointly learning of these two tasks can promote each other by sharing common information. However, scene segmentation concerns more on the local difference between adjacent shots while classification needs the global representation of scene segments, which probably leads to the model dominated by one of the two tasks in the training phase. In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category. To the end, we propose a general One Stage Multimodal Sequential Link Framework (OS-MSL) to both distinguish and leverage the two-fold semantics by reforming the two learning tasks into a unified one. Furthermore, we tailor a specific module called DiffCorrNet to explicitly extract the information of differences and correlations among shots. Extensive experiments on a brand-new large scale dataset collected from real-world applications, and MovieScenes are conducted. Both the results demonstrate the effectiveness of our proposed method against strong baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes