Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach
This work addresses video understanding for applications like summarization or editing, but it is incremental as it builds on existing methods for a specific challenge.
The paper tackles generic event boundary detection in videos by proposing a local context modeling and global boundary decoding approach, achieving an 85.13% F1-score on the Kinetics-GEBD testing set, which is a 22% improvement over the baseline.
Generic event boundary detection (GEBD) is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries. In this paper, we present a local context modeling and global boundary decoding approach for GEBD task. Local context modeling sub-network is proposed to perceive diverse patterns of generic event boundaries, and it generates powerful video representations and reliable boundary confidence. Based on them, global boundary decoding sub-network is exploited to decode event boundaries from a global view. Our proposed method achieves 85.13% F1-score on Kinetics-GEBD testing set, which achieves a more than 22% F1-score boost compared to the baseline method. The code is available at https://github.com/JackyTown/GEBD_Challenge_CVPR2022.