Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition
This work addresses computational efficiency in action recognition for applications like surveillance and human-computer interaction, representing an incremental improvement by enhancing existing methods with semantics and hierarchical modeling.
The paper tackles the problem of inefficient deep models for skeleton-based human action recognition by proposing a semantics-guided neural network (SGN) that explicitly incorporates joint semantics and hierarchical relationships, achieving state-of-the-art performance with a model size an order of magnitude smaller than previous works on datasets like NTU60, NTU120, and SYSU.
Skeleton-based human action recognition has attracted great interest thanks to the easy accessibility of the human skeleton data. Recently, there is a trend of using very deep feedforward neural networks to model the 3D coordinates of joints without considering the computational efficiency. In this paper, we propose a simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition. We explicitly introduce the high level semantics of joints (joint type and frame index) into the network to enhance the feature representation capability. In addition, we exploit the relationship of joints hierarchically through two modules, i.e., a joint-level module for modeling the correlations of joints in the same frame and a framelevel module for modeling the dependencies of frames by taking the joints in the same frame as a whole. A strong baseline is proposed to facilitate the study of this field. With an order of magnitude smaller model size than most previous works, SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets. The source code is available at https://github.com/microsoft/SGN.