CVMar 9, 2017

UntrimmedNets for Weakly Supervised Action Recognition and Detection

arXiv:1703.03329v2520 citations
AI Analysis

This addresses the costly need for trimmed video datasets in action recognition, offering a more efficient solution for video analysis tasks.

The paper tackles the problem of action recognition and detection in untrimmed videos without temporal annotations by proposing UntrimmedNet, a weakly supervised architecture that couples classification and selection modules, achieving performance superior or comparable to strongly supervised methods on THUMOS14 and ActivityNet datasets.

Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two components are implemented with feed-forward networks, and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of those strongly supervised approaches on these two datasets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes