Generic Event Boundary Detection Challenge at CVPR 2021 Technical Report: Cascaded Temporal Attention Network (CASTANET)
This work addresses event boundary detection in videos, which is important for video analysis applications, but it is incremental as it builds on existing methods with specific architectural improvements.
The paper tackles the problem of generic event boundary detection by proposing a Cascaded Temporal Attention Network (CASTANET), which achieves an 83.30% F1 score on the Kinetics-GEBD test set, improving by 20.5% over the baseline.
This report presents the approach used in the submission of Generic Event Boundary Detection (GEBD) Challenge at CVPR21. In this work, we design a Cascaded Temporal Attention Network (CASTANET) for GEBD, which is formed by three parts, the backbone network, the temporal attention module, and the classification module. Specifically, the Channel-Separated Convolutional Network (CSN) is used as the backbone network to extract features, and the temporal attention module is designed to enforce the network to focus on the discriminative features. After that, the cascaded architecture is used in the classification module to generate more accurate boundaries. In addition, the ensemble strategy is used to further improve the performance of the proposed method. The proposed method achieves 83.30% F1 score on Kinetics-GEBD test set, which improves 20.5% F1 score compared to the baseline method. Code is available at https://github.com/DexiangHong/Cascade-PC.