CVAIHCMMNEApr 3, 2017

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

arXiv:1704.00616v2224 citations
AI Analysis

It improves action classification and detection for video analysis, but is incremental as it builds on existing multi-cue approaches.

The paper tackled action recognition by integrating pose, motion, and appearance cues using a Markov chain model, achieving state-of-the-art performance on datasets like HMDB51 and UCF101.

General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes