CVMay 29, 2025

Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition

arXiv:2505.23012v13 citationsh-index: 38IEEE Trans Biom Behav Identity Sci
Originality Incremental advance
AI Analysis

This work addresses action classification in computer vision, offering an incremental improvement by focusing on previously underutilized static joint interactions.

The paper tackles the problem of skeleton-based action recognition by introducing a novel spatial-temporal joint density (STJD) measurement to quantify interactions between moving and static joints, leading to improved performance with gains of 3.5 and 3.6 percentage points over state-of-the-art methods on the NTU RGB+D 120 dataset.

Traditional approaches in unsupervised or self supervised learning for skeleton-based action classification have concentrated predominantly on the dynamic aspects of skeletal sequences. Yet, the intricate interaction between the moving and static elements of the skeleton presents a rarely tapped discriminative potential for action classification. This paper introduces a novel measurement, referred to as spatial-temporal joint density (STJD), to quantify such interaction. Tracking the evolution of this density throughout an action can effectively identify a subset of discriminative moving and/or static joints termed "prime joints" to steer self-supervised learning. A new contrastive learning strategy named STJD-CL is proposed to align the representation of a skeleton sequence with that of its prime joints while simultaneously contrasting the representations of prime and nonprime joints. In addition, a method called STJD-MP is developed by integrating it with a reconstruction-based framework for more effective learning. Experimental evaluations on the NTU RGB+D 60, NTU RGB+D 120, and PKUMMD datasets in various downstream tasks demonstrate that the proposed STJD-CL and STJD-MP improved performance, particularly by 3.5 and 3.6 percentage points over the state-of-the-art contrastive methods on the NTU RGB+D 120 dataset using X-sub and X-set evaluations, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes