CVFeb 5, 2023

Pyramid Self-attention Polymerization Learning for Semi-supervised Skeleton-based Action Recognition

arXiv:2302.02327v148 citationsh-index: 34Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of limited labeled data in skeleton-based action recognition by incorporating coarse-grained motion characteristics, offering a domain-specific improvement.

The paper tackles the problem of semi-supervised skeleton-based action recognition by proposing a framework that learns action representations at multiple granularities (body, part, joint levels) using contrastive learning, achieving competitive performance on NTU RGB+D and North-Western UCLA datasets.

Most semi-supervised skeleton-based action recognition approaches aim to learn the skeleton action representations only at the joint level, but neglect the crucial motion characteristics at the coarser-grained body (e.g., limb, trunk) level that provide rich additional semantic information, though the number of labeled data is limited. In this work, we propose a novel Pyramid Self-attention Polymerization Learning (dubbed as PSP Learning) framework to jointly learn body-level, part-level, and joint-level action representations of joint and motion data containing abundant and complementary semantic information via contrastive learning covering coarse-to-fine granularity. Specifically, to complement semantic information from coarse to fine granularity in skeleton actions, we design a new Pyramid Polymerizing Attention (PPA) mechanism that firstly calculates the body-level attention map, part-level attention map, and joint-level attention map, as well as polymerizes these attention maps in a level-by-level way (i.e., from body level to part level, and further to joint level). Moreover, we present a new Coarse-to-fine Contrastive Loss (CCL) including body-level contrast loss, part-level contrast loss, and joint-level contrast loss to jointly measure the similarity between the body/part/joint-level contrasting features of joint and motion data. Finally, extensive experiments are conducted on the NTU RGB+D and North-Western UCLA datasets to demonstrate the competitive performance of the proposed PSP Learning in the semi-supervised skeleton-based action recognition task. The source codes of PSP Learning are publicly available at https://github.com/1xbq1/PSP-Learning.

View on arXiv PDF Code

Similar