CVAug 20, 2025

MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition

arXiv:2508.14889v11 citationsh-index: 58
Originality Incremental advance
AI Analysis

This work addresses the challenge of generalizing across diverse skeleton datasets for human action recognition, offering a novel self-supervised approach that is incremental in enhancing existing contrastive learning methods.

The paper tackles the problem of limited generalization in skeleton-based action recognition by proposing MS-CLR, a multi-skeleton contrastive learning framework that aligns pose representations across different skeleton conventions, resulting in improved performance and new state-of-the-art results on NTU RGB+D 60 and 120 datasets.

Contrastive learning has gained significant attention in skeleton-based action recognition for its ability to learn robust representations from unlabeled data. However, existing methods rely on a single skeleton convention, which limits their ability to generalize across datasets with diverse joint structures and anatomical coverage. We propose Multi-Skeleton Contrastive Learning (MS-CLR), a general self-supervised framework that aligns pose representations across multiple skeleton conventions extracted from the same sequence. This encourages the model to learn structural invariances and capture diverse anatomical cues, resulting in more expressive and generalizable features. To support this, we adapt the ST-GCN architecture to handle skeletons with varying joint layouts and scales through a unified representation scheme. Experiments on the NTU RGB+D 60 and 120 datasets demonstrate that MS-CLR consistently improves performance over strong single-skeleton contrastive learning baselines. A multi-skeleton ensemble further boosts performance, setting new state-of-the-art results on both datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes