CVApr 1, 2019

Dance with Flow: Two-in-One Stream Action Detection

arXiv:1904.00696v392 citations
Originality Incremental advance
AI Analysis

This work addresses the computational inefficiency of two-stream action detection networks, offering a more efficient solution for video analysis tasks.

The paper tackles the problem of action detection by proposing a two-in-one stream network that embeds RGB and optical-flow into a single model, reducing computation and parameters by half while achieving impressive results on datasets like UCF101-24, UCFSports, and J-HMDB.

The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes