CVMay 9, 2017

Deep Spatio-temporal Manifold Network for Action Recognition

Ce Li, Chen Chen, Baochang Zhang, Qixiang Ye, Jungong Han, Rongrong Ji

arXiv:1705.03148v11.75 citations

Originality Highly original

AI Analysis

This work addresses action recognition in videos, an important computer vision task, with a novel method that incorporates manifold priors into deep learning.

The paper tackles the problem of action recognition in videos by leveraging manifold structure to constrain deep feature learning, reducing intra-class variations and alleviating over-fitting. The proposed Spatio-Temporal Manifold Network (STMN) shows significant improvements over baselines on two benchmark datasets.

Visual data such as videos are often sampled from complex manifold. We propose leveraging the manifold structure to constrain the deep action feature learning, thereby minimizing the intra-class variations in the feature space and alleviating the over-fitting problem. Considering that manifold can be transferred, layer by layer, from the data domain to the deep features, the manifold priori is posed from the top layer into the back propagation learning procedure of convolutional neural network (CNN). The resulting algorithm --Spatio-Temporal Manifold Network-- is solved with the efficient Alternating Direction Method of Multipliers and Backward Propagation (ADMM-BP). We theoretically show that STMN recasts the problem as projection over the manifold via an embedding method. The proposed approach is evaluated on two benchmark datasets, showing significant improvements to the baselines.

View on arXiv PDF

Similar