CVApr 19, 2017

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

arXiv:1704.05645v2257 citations
Originality Incremental advance
AI Analysis

This addresses action recognition from skeleton data for applications like surveillance or human-computer interaction, but it is incremental as it builds on existing CNN fine-tuning methods.

The paper tackles skeleton-based action recognition by transforming skeleton videos into color images using a translation-scale invariant mapping and applying a multi-scale deep CNN fine-tuned on pre-trained models, achieving state-of-the-art results on datasets like NTU RGB+D, UTD-MHAD, MSRC-12, and G3D with large margins.

This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes