CVNov 30, 2017

Budget-Aware Activity Detection with A Recurrent Policy Network

arXiv:1712.00097v26 citations
Originality Highly original
AI Analysis

This work addresses the practical problem of slow inference for real-world activity detection systems by significantly reducing processing time.

This paper tackles the problem of slow inference times in temporal activity detection for untrimmed long videos. The authors propose a budget-aware framework that intelligently selects a small subset of frames, achieving competitive detection accuracy while reducing computation time to 0.35s per video.

In this paper, we address the challenging problem of efficient temporal activity detection in untrimmed long videos. While most recent work has focused and advanced the detection accuracy, the inference time can take seconds to minutes in processing each single video, which is too slow to be useful in real-world settings. This motivates the proposed budget-aware framework, which learns to perform activity detection by intelligently selecting a small subset of frames according to a specified time budget. We formulate this problem as a Markov decision process, and adopt a recurrent network to model the frame selection policy. We derive a recurrent policy gradient based approach to approximate the gradient of the non-decomposable and non-differentiable objective defined in our problem. In the extensive experiments, we achieve competitive detection accuracy, and more importantly, our approach is able to substantially reduce computation time and detect multiple activities with only 0.35s for each untrimmed long video.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes