CVApr 25, 2017

Skeleton-based Action Recognition with Convolutional Neural Networks

arXiv:1704.07595v1410 citations
Originality Incremental advance
AI Analysis

This addresses action recognition from skeleton data, offering a novel CNN approach that outperforms RNN-based methods, though it is incremental in improving existing techniques.

The paper tackles skeleton-based action recognition by proposing a CNN-based framework for classification and detection, achieving 89.3% accuracy on NTU RGB+D and 93.7% mAP on PKU-MMD.

Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional neural networks (CNN) based framework for both action classification and detection. Raw skeleton coordinates as well as skeleton motion are fed directly into CNN for label prediction. A novel skeleton transformer module is designed to rearrange and select important skeleton joints automatically. With a simple 7-layer network, we obtain 89.3% accuracy on validation set of the NTU RGB+D dataset. For action detection in untrimmed videos, we develop a window proposal network to extract temporal segment proposals, which are further classified within the same network. On the recent PKU-MMD dataset, we achieve 93.7% mAP, surpassing the baseline by a large margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes