CVFeb 8, 2022

Joint-bone Fusion Graph Convolutional Network for Semi-supervised Skeleton Action Recognition

arXiv:2202.04075v1134 citations
Originality Incremental advance
AI Analysis

This work addresses action recognition for applications like surveillance or human-computer interaction, but it is incremental as it builds on existing GCN methods with a novel fusion approach.

The paper tackles the problem of skeleton-based human action recognition by addressing limitations in existing graph convolutional networks, such as insufficient exploration of joint-bone correlations and reliance on labeled data, proposing a semi-supervised method that achieves state-of-the-art performance on datasets like NTU-RGB+D and Kinetics-Skeleton.

In recent years, graph convolutional networks (GCNs) play an increasingly critical role in skeleton-based human action recognition. However, most GCN-based methods still have two main limitations: 1) They only consider the motion information of the joints or process the joints and bones separately, which are unable to fully explore the latent functional correlation between joints and bones for action recognition. 2) Most of these works are performed in the supervised learning way, which heavily relies on massive labeled training data. To address these issues, we propose a semi-supervised skeleton-based action recognition method which has been rarely exploited before. We design a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder to achieve semi-supervised learning. Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream, so that promoting both streams to learn more discriminative feature representations. The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data, which is essential for action recognition. Extensive experiments on two popular datasets, i.e. NTU-RGB+D and Kinetics-Skeleton, demonstrate that our model achieves the state-of-the-art performance for semi-supervised skeleton-based action recognition and is also useful for fully-supervised methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes