A Deep Learning Approach for Real-Time 3D Human Action Recognition from Skeletal Data
This work addresses the problem of enhancing monitoring and security in public transport through vision-based surveillance, though it is incremental as it builds on existing deep learning methods with some novel adaptations.
The authors tackled real-time 3D human action recognition from skeletal data by encoding skeleton poses and motions into RGB images, applying Adaptive Histogram Equalization, and using DenseNet-based networks, achieving state-of-the-art accuracy on two datasets with low computational time and promising results on a new surveillance dataset.
We present a new deep learning approach for real-time 3D human action recognition from skeletal data and apply it to develop a vision-based intelligent surveillance system. Given a skeleton sequence, we propose to encode skeleton poses and their motions into a single RGB image. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the color images to enhance their local patterns and generate more discriminative features. For learning and classification tasks, we design Deep Neural Networks based on the Densely Connected Convolutional Architecture (DenseNet) to extract features from enhanced-color images and classify them into classes. Experimental results on two challenging datasets show that the proposed method reaches state-of-the-art accuracy, whilst requiring low computational time for training and inference. This paper also introduces CEMEST, a new RGB-D dataset depicting passenger behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic normal and anomalous events. We achieve promising results on real conditions of this dataset with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing monitoring and security in public transport.