CVJan 2, 2022

TVNet: Temporal Voting Network for Action Localization

Hanyuan Wang, Dima Damen, Majid Mirmehdi, Toby Perrett

arXiv:2201.00434v13.76 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of accurately localizing actions in videos for computer vision applications, representing an incremental improvement over prior methods.

The paper tackles action localization in untrimmed videos by proposing TVNet with a Voting Evidence Module to locate temporal boundaries, achieving an average mAP of 34.6% on ActivityNet-1.3 and up to 59.1% mAP on THUMOS14 when combined with other methods.

We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries. Our action-independent evidence module is incorporated within a pipeline to calculate confidence scores and action classes. We achieve an average mAP of 34.6% on ActivityNet-1.3, particularly outperforming previous methods with the highest IoU of 0.95. TVNet also achieves mAP of 56.0% when combined with PGCN and 59.1% with MUSES at 0.5 IoU on THUMOS14 and outperforms prior work at all thresholds. Our code is available at https://github.com/hanielwang/TVNet.

View on arXiv PDF Code

Similar