CVApr 29, 2021

3D Human Action Representation Learning via Cross-View Consistency Pursuit

arXiv:2104.14466v2227 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of learning action representations without labeled data for applications in human-computer interaction and surveillance, presenting an incremental improvement over existing contrastive learning methods.

The paper tackles unsupervised 3D skeleton-based action recognition by proposing a cross-view contrastive learning framework, achieving state-of-the-art results on NTU-60 and NTU-120 datasets with improved representation quality.

In this work, we propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action Representation (CrosSCLR), by leveraging multi-view complementary supervision signal. CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner. It is noted that CVC-KM works in such a way that high-confidence positive/negative samples and their distributions are exchanged among views according to their embedding similarity, ensuring cross-view consistency in terms of contrastive context, i.e., similar distributions. Extensive experiments show that CrosSCLR achieves remarkable action recognition results on NTU-60 and NTU-120 datasets under unsupervised settings, with observed higher-quality action representations. Our code is available at https://github.com/LinguoLi/CrosSCLR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes