CVLGJul 18, 2024

SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

arXiv:2407.13460v124 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses dataset imbalance issues for researchers in action recognition, though it is incremental as it builds on existing projection-based methods.

The paper tackles the challenge of aligning skeleton features with semantic embeddings in zero-shot skeleton-based action recognition by proposing SA-DVAE, which uses disentangled variational autoencoders to separate semantic-related and irrelevant features, resulting in improved performance on benchmark datasets like NTU RGB+D and PKU-MMD.

Existing zero-shot skeleton-based action recognition methods utilize projection networks to learn a shared latent space of skeleton features and semantic embeddings. The inherent imbalance in action recognition datasets, characterized by variable skeleton sequences yet constant class labels, presents significant challenges for alignment. To address the imbalance, we propose SA-DVAE -- Semantic Alignment via Disentangled Variational Autoencoders, a method that first adopts feature disentanglement to separate skeleton features into two independent parts -- one is semantic-related and another is irrelevant -- to better align skeleton and semantic features. We implement this idea via a pair of modality-specific variational autoencoders coupled with a total correction penalty. We conduct experiments on three benchmark datasets: NTU RGB+D, NTU RGB+D 120 and PKU-MMD, and our experimental results show that SA-DAVE produces improved performance over existing methods. The code is available at https://github.com/pha123661/SA-DVAE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes