Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition
This work addresses the problem of distinguishing similar actions in noisy skeleton data for applications like human-computer interaction, though it appears incremental as it builds on existing graph convolutional networks.
The paper tackles fine-grained skeleton-based action recognition by proposing a Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss, achieving state-of-the-art performance on datasets like FineGym99, FSD-10, and NTU-RGB+D X-view.
Graph convolutional networks have been widely used in skeleton-based action recognition. However, existing approaches are limited in fine-grained action recognition due to the similarity of inter-class data. Moreover, the noisy data from pose extraction increases the challenge of fine-grained recognition. In this work, we propose a flexible attention block called Channel-Variable Spatial-Temporal Attention (CVSTA) to enhance the discriminative power of spatial-temporal joints and obtain a more compact intra-class feature distribution. Based on CVSTA, we construct a Multi-Dimensional Refinement Graph Convolutional Network (MDR-GCN), which can improve the discrimination among channel-, joint- and frame-level features for fine-grained actions. Furthermore, we propose a Robust Decouple Loss (RDL), which significantly boosts the effect of the CVSTA and reduces the impact of noise. The proposed method combining MDR-GCN with RDL outperforms the known state-of-the-art skeleton-based approaches on fine-grained datasets, FineGym99 and FSD-10, and also on the coarse dataset NTU-RGB+D X-view version.