CVNov 8, 2016

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

arXiv:1611.02447v2384 citations
AI Analysis

This addresses the challenge of real-time human action recognition for computer vision applications, representing an incremental improvement.

The paper tackled the problem of video-based action recognition by encoding 3D skeleton sequences into 2D Joint Trajectory Maps and using Convolutional Neural Networks, achieving state-of-the-art results on three public benchmarks.

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in $3D$ skeleton sequences into multiple $2D$ images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes