CVNov 10, 2022

Prior-enhanced Temporal Action Localization using Subject-aware Spatial Attention

Yifan Liu, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Haoqian Wang

arXiv:2211.05299v11.4h-index: 32

Originality Incremental advance

AI Analysis

This work addresses the challenge of accurately localizing action boundaries in videos for applications like video analysis, though it is incremental as it builds on existing methods by adding subject-aware priors.

The paper tackles the problem of temporal action localization in videos by addressing excessive attention to background and key objects, proposing a method that incorporates action subjects as priors to improve boundary detection, resulting in a performance boost of up to 2.41% mAP on THUMOS-14.

Temporal action localization (TAL) aims to detect the boundary and identify the class of each action instance in a long untrimmed video. Current approaches treat video frames homogeneously, and tend to give background and key objects excessive attention. This limits their sensitivity to localize action boundaries. To this end, we propose a prior-enhanced temporal action localization method (PETAL), which only takes in RGB input and incorporates action subjects as priors. This proposal leverages action subjects' information with a plug-and-play subject-aware spatial attention module (SA-SAM) to generate an aggregated and subject-prioritized representation. Experimental results on THUMOS-14 and ActivityNet-1.3 datasets demonstrate that the proposed PETAL achieves competitive performance using only RGB features, e.g., boosting mAP by 2.41% or 0.25% over the state-of-the-art approach that uses RGB features or with additional optical flow features on the THUMOS-14 dataset.

View on arXiv PDF

Similar