Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network
It addresses efficiency and flexibility issues in video analysis for applications like surveillance or sports analytics, but is incremental as it builds on Faster-RCNN with domain-specific adaptations.
The paper tackles the problem of temporal action detection in untrimmed videos by proposing Faster-TAD, a unified network that simplifies the pipeline and achieves notable performance improvements, such as 40.01% mAP on ActivityNet-1.3 and 54.09% mAP on SoccerNet-Action Spotting.
Temporal action detection (TAD) aims to detect the semantic labels and boundaries of action instances in untrimmed videos. Current mainstream approaches are multi-step solutions, which fall short in efficiency and flexibility. In this paper, we propose a unified network for TAD, termed Faster-TAD, by re-purposing a Faster-RCNN like architecture. To tackle the unique difficulty in TAD, we make important improvements over the original framework. We propose a new Context-Adaptive Proposal Module and an innovative Fake-Proposal Generation Block. What's more, we use atomic action features to improve the performance. Faster-TAD simplifies the pipeline of TAD and gets remarkable performance on lots of benchmarks, i.e., ActivityNet-1.3 (40.01% mAP), HACS Segments (38.39% mAP), SoccerNet-Action Spotting (54.09% mAP). It outperforms existing single-network detector by a large margin.