CVNov 20, 2018

A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos

arXiv:1811.08496v240 citations
Originality Incremental advance
AI Analysis

This addresses action detection in security videos, but it is incremental as it builds on existing proposal-based methods with specific refinements.

The paper tackles spatio-temporal action detection in untrimmed security videos by proposing a two-stage system with dense proposal generation and a Temporal Refinement I3D network, achieving effectiveness on the DIVA dataset and evaluated on THUMOS14.

Existing approaches for spatio-temporal action detection in videos are limited by the spatial extent and temporal duration of the actions. In this paper, we present a modular system for spatio-temporal action detection in untrimmed security videos. We propose a two stage approach. The first stage generates dense spatio-temporal proposals using hierarchical clustering and temporal jittering techniques on frame-wise object detections. The second stage is a Temporal Refinement I3D (TRI-3D) network that performs action classification and temporal refinement on the generated proposals. The object detection-based proposal generation step helps in detecting actions occurring in a small spatial region of a video frame, while temporal jittering and refinement helps in detecting actions of variable lengths. Experimental results on the spatio-temporal action detection dataset - DIVA - show the effectiveness of our system. For comparison, the performance of our system is also evaluated on the THUMOS14 temporal action detection dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes