CVNov 20, 2018

A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos

Joshua Gleason, Rajeev Ranjan, Steven Schwarcz, Carlos D. Castillo, Jun-Chen Cheng, Rama Chellappa

arXiv:1811.08496v210.740 citations

Originality Incremental advance

AI Analysis

This addresses action detection in security videos, but it is incremental as it builds on existing proposal-based methods with specific refinements.

The paper tackles spatio-temporal action detection in untrimmed security videos by proposing a two-stage system with dense proposal generation and a Temporal Refinement I3D network, achieving effectiveness on the DIVA dataset and evaluated on THUMOS14.

Existing approaches for spatio-temporal action detection in videos are limited by the spatial extent and temporal duration of the actions. In this paper, we present a modular system for spatio-temporal action detection in untrimmed security videos. We propose a two stage approach. The first stage generates dense spatio-temporal proposals using hierarchical clustering and temporal jittering techniques on frame-wise object detections. The second stage is a Temporal Refinement I3D (TRI-3D) network that performs action classification and temporal refinement on the generated proposals. The object detection-based proposal generation step helps in detecting actions occurring in a small spatial region of a video frame, while temporal jittering and refinement helps in detecting actions of variable lengths. Experimental results on the spatio-temporal action detection dataset - DIVA - show the effectiveness of our system. For comparison, the performance of our system is also evaluated on the THUMOS14 temporal action detection dataset.

View on arXiv PDF

Similar