Unifying Graph Embedding Features with Graph Convolutional Networks for Skeleton-based Action Recognition
This work addresses the problem of improving human action recognition for applications like surveillance or human-computer interaction, but it is incremental as it builds on existing graph convolutional network methods.
The paper tackled the limitation of basic graph features in skeleton-based action recognition by unifying 15 graph embedding features into a graph convolutional network, achieving state-of-the-art performance on three large-scale datasets (NTU-RGB+D, Kinetics, and SYSU-3D).
Combining skeleton structure with graph convolutional networks has achieved remarkable performance in human action recognition. Since current research focuses on designing basic graph for representing skeleton data, these embedding features contain basic topological information, which cannot learn more systematic perspectives from skeleton data. In this paper, we overcome this limitation by proposing a novel framework, which unifies 15 graph embedding features into the graph convolutional network for human action recognition, aiming to best take advantage of graph information to distinguish key joints, bones, and body parts in human action, instead of being exclusive to a single feature or domain. Additionally, we fully investigate how to find the best graph features of skeleton structure for improving human action recognition. Besides, the topological information of the skeleton sequence is explored to further enhance the performance in a multi-stream framework. Moreover, the unified graph features are extracted by the adaptive methods on the training process, which further yields improvements. Our model is validated by three large-scale datasets, namely NTU-RGB+D, Kinetics and SYSU-3D, and outperforms the state-of-the-art methods. Overall, our work unified graph embedding features to promotes systematic research on human action recognition.