CVJan 2, 2022

TVNet: Temporal Voting Network for Action Localization

arXiv:2201.00434v16 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of accurately localizing actions in videos for computer vision applications, representing an incremental improvement over prior methods.

The paper tackles action localization in untrimmed videos by proposing TVNet with a Voting Evidence Module to locate temporal boundaries, achieving an average mAP of 34.6% on ActivityNet-1.3 and up to 59.1% mAP on THUMOS14 when combined with other methods.

We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries. Our action-independent evidence module is incorporated within a pipeline to calculate confidence scores and action classes. We achieve an average mAP of 34.6% on ActivityNet-1.3, particularly outperforming previous methods with the highest IoU of 0.95. TVNet also achieves mAP of 56.0% when combined with PGCN and 59.1% with MUSES at 0.5 IoU on THUMOS14 and outperforms prior work at all thresholds. Our code is available at https://github.com/hanielwang/TVNet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes