CVJun 20, 2021

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang

arXiv:2106.11811v14.75 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of poor detection performance in video analysis for researchers, but it is incremental as it builds on existing methods like BaSNet.

The paper tackles the problem of separating foreground actions from background in weakly-supervised temporal action localization by proposing a Local-Global Background Modeling Network, achieving 22.45% mAP on the HACS Challenge test set.

Weakly-Supervised Temporal Action Localization (WS-TAL) task aims to recognize and localize temporal starts and ends of action instances in an untrimmed video with only video-level label supervision. Due to lack of negative samples of background category, it is difficult for the network to separate foreground and background, resulting in poor detection performance. In this report, we present our 2021 HACS Challenge - Weakly-supervised Learning Track solution that based on BaSNet to address above problem. Specifically, we first adopt pre-trained CSN, Slowfast, TDN, and ViViT as feature extractors to get feature sequences. Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL). Finally, we ensemble multiple models to get the final detection results and reach 22.45% mAP on the test set

View on arXiv PDF

Similar