CVAIJan 12

Variational Contrastive Learning for Skeleton-based Action Recognition

arXiv:2601.07666v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of capturing variability and uncertainty in human motion for skeleton-based action recognition, offering an incremental improvement over existing contrastive methods.

The paper tackles the problem of self-supervised representation learning for skeleton-based action recognition by proposing a variational contrastive learning framework that integrates probabilistic latent modeling with contrastive learning, resulting in consistent outperformance of existing approaches, particularly in low-label regimes, as shown on three benchmarks.

In recent years, self-supervised representation learning for skeleton-based action recognition has advanced with the development of contrastive learning methods. However, most of contrastive paradigms are inherently discriminative and often struggle to capture the variability and uncertainty intrinsic to human motion. To address this issue, we propose a variational contrastive learning framework that integrates probabilistic latent modeling with contrastive self-supervised learning. This formulation enables the learning of structured and semantically meaningful representations that generalize across different datasets and supervision levels. Extensive experiments on three widely used skeleton-based action recognition benchmarks show that our proposed method consistently outperforms existing approaches, particularly in low-label regimes. Moreover, qualitative analyses show that the features provided by our method are more relevant given the motion and sample characteristics, with more focus on important skeleton joints, when compared to the other methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes