Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking
This work addresses action proposal generation for video analysis, which is incremental as it builds on existing methods with improvements in linking and optimization.
The paper tackles the problem of searching action proposals in unconstrained video clips by estimating actionness and linking bounding boxes across frames, achieving state-of-the-art performance on UCF-Sports and UCF-101 datasets in terms of accuracy and proposal quantity.
In this paper, we address the problem of searching action proposals in unconstrained video clips. Our approach starts from actionness estimation on frame-level bounding boxes, and then aggregates the bounding boxes belonging to the same actor across frames via linking, associating, tracking to generate spatial-temporal continuous action paths. To achieve the target, a novel actionness estimation method is firstly proposed by utilizing both human appearance and motion cues. Then, the association of the action paths is formulated as a maximum set coverage problem with the results of actionness estimation as a priori. To further promote the performance, we design an improved optimization objective for the problem and provide a greedy search algorithm to solve it. Finally, a tracking-by-detection scheme is designed to further refine the searched action paths. Extensive experiments on two challenging datasets, UCF-Sports and UCF-101, show that the proposed approach advances state-of-the-art proposal generation performance in terms of both accuracy and proposal quantity.