CVNov 8, 2016

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

arXiv:1611.02447v221.4384 citationsh-index: 57

Originality Incremental advance

AI Analysis

This addresses the challenge of real-time human action recognition for computer vision applications, representing an incremental improvement.

The paper tackled the problem of video-based action recognition by encoding 3D skeleton sequences into 2D Joint Trajectory Maps and using Convolutional Neural Networks, achieving state-of-the-art results on three public benchmarks.

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in $3D$ skeleton sequences into multiple $2D$ images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.

View on arXiv PDF

Similar