CVJun 25, 2022

SC-Transformer++: Structured Context Transformer for Generic Event Boundary Detection

Dexiang Hong, Xiaoqi Ma, Xinyao Wang, Congcong Li, Yufei Wang, Longyin Wen

arXiv:2206.12634v16.55 citationsh-index: 48Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of detecting event boundaries in videos for computer vision applications, but it is incremental as it builds directly on an existing method.

The authors improved the Structured Context Transformer method for generic event boundary detection by adding a transformer decoder module, introducing optical flow as a new modality, and using model ensemble, achieving an 86.49% F1 score on the Kinetics-GEBD test set, which is a 2.86% improvement over the previous state-of-the-art.

This report presents the algorithm used in the submission of Generic Event Boundary Detection (GEBD) Challenge at CVPR 2022. In this work, we improve the existing Structured Context Transformer (SC-Transformer) method for GEBD. Specifically, a transformer decoder module is added after transformer encoders to extract high quality frame features. The final classification is performed jointly on the results of the original binary classifier and a newly introduced multi-class classifier branch. To enrich motion information, optical flow is introduced as a new modality. Finally, model ensemble is used to further boost performance. The proposed method achieves 86.49% F1 score on Kinetics-GEBD test set. which improves 2.86% F1 score compared to the previous SOTA method.

View on arXiv PDF Code

Similar