CVSep 21, 2020

DeepActsNet: Spatial and Motion features from Face, Hands, and Body Combined with Convolutional and Graph Networks for Improved Action Recognition

Umar Asif, Deval Mehta, Stefan von Cavallar, Jianbin Tang, Stefan Harrer

arXiv:2009.09818v31.2

Originality Incremental advance

AI Analysis

This work addresses action recognition for video analysis, offering incremental improvements by integrating additional features beyond standard body skeleton data.

The paper tackled action recognition by combining body skeleton data with spatial and motion features from the face and hands, introducing Deep Action Stamps (DeepActs) as a novel representation and DeepActsNet as an ensemble model. Experiments on NTU60, NTU120, and SYSU datasets showed improved accuracy with less computational cost compared to state-of-the-art methods.

Existing action recognition methods mainly focus on joint and bone information in human body skeleton data due to its robustness to complex backgrounds and dynamic characteristics of the environments. In this paper, we combine body skeleton data with spatial and motion features from face and two hands, and present "Deep Action Stamps (DeepActs)", a novel data representation to encode actions from video sequences. We also present "DeepActsNet", a deep learning based ensemble model which learns convolutional and structural features from Deep Action Stamps for highly accurate action recognition. Experiments on three challenging action recognition datasets (NTU60, NTU120, and SYSU) show that the proposed model trained using Deep Action Stamps produce considerable improvements in the action recognition accuracy with less computational cost compared to the state-of-the-art methods.

View on arXiv PDF

Similar