CVJul 7, 2016

Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge

arXiv:1607.01979v291 citations
AI Analysis

This work addresses the challenge of identifying and localizing activities in untrimmed videos, which is incremental as it builds on existing classification methods for detection.

The paper tackled the problem of temporal activity detection in untrimmed videos by proposing a method that uses global video-level classification to predict labels and combines frame-level binary classification with dynamic programming to generate trimmed activity proposals, resulting in a demonstration that untrimmed classification models can serve as a foundation for detection tasks.

Current state-of-the-art human activity recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame. We propose a simple, yet effective, method for the temporal detection of activities in temporally untrimmed videos with the help of untrimmed classification. Firstly, our model predicts the top k labels for each untrimmed video by analysing global video-level features. Secondly, frame-level binary classification is combined with dynamic programming to generate the temporally trimmed activity proposals. Finally, each proposal is assigned a label based on the global label, and scored with the score of the temporal activity proposal and the global score. Ultimately, we show that untrimmed video classification models can be used as stepping stone for temporal detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes