Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition
This work addresses the challenge of sample efficiency and fine-grained recognition in skeleton-based action recognition for applications like human-computer interaction, though it is incremental as it builds on existing meta-learning methods.
The paper tackles the problem of unstable representativity in one-shot skeleton-based action recognition by replacing global average embedding comparisons with a focused sum of similarity measurements on aligned local embeddings of action-critical segments, achieving state-of-the-art results on the NTU-RGB+D 120 benchmark.
Skeleton-based action recognition receives increasing attention because the skeleton representations reduce the amount of training data by eliminating visual information irrelevant to actions. To further improve the sample efficiency, meta-learning-based one-shot learning solutions were developed for skeleton-based action recognition. These methods find the nearest neighbor according to the similarity between instance-level global average embedding. However, such measurement holds unstable representativity due to inadequate generalized learning on local invariant and noisy features, while intuitively, more fine-grained recognition usually relies on determining key local body movements. To address this limitation, we present the Adaptive Local-Component-aware Graph Convolutional Network, which replaces the comparison metric with a focused sum of similarity measurements on aligned local embedding of action-critical spatial/temporal segments. Comprehensive one-shot experiments on the public benchmark of NTU-RGB+D 120 indicate that our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.