CVJul 21, 2015

Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

arXiv:1507.05738v3464 citations
Originality Incremental advance
AI Analysis

This work addresses the need for detailed action labeling in unconstrained videos, which is incremental by building on existing datasets and methods.

The authors tackled the problem of dense, multi-label action recognition in complex videos by introducing the MultiTHUMOS dataset and a novel LSTM variant, resulting in improved labeling accuracy and enabling deeper understanding tasks like structured retrieval and action prediction.

Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory (LSTM) deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes