CVAISep 9, 2024

ReL-SAR: Representation Learning for Skeleton Action Recognition with Convolutional Transformers and BYOL

arXiv:2409.05749v15 citationsh-index: 14Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of high annotation and computation costs in skeleton action recognition by enabling effective use of unlabeled data, though it is incremental as it builds on existing methods like transformers and BYOL.

The paper tackled unsupervised representation learning for skeleton action recognition by designing a lightweight convolutional transformer framework with a joint spatial-temporal modeling approach and a BYOL-based learning strategy, achieving competitive results on multiple limited-size datasets.

To extract robust and generalizable skeleton action recognition features, large amounts of well-curated data are typically required, which is a challenging task hindered by annotation and computation costs. Therefore, unsupervised representation learning is of prime importance to leverage unlabeled skeleton data. In this work, we investigate unsupervised representation learning for skeleton action recognition. For this purpose, we designed a lightweight convolutional transformer framework, named ReL-SAR, exploiting the complementarity of convolutional and attention layers for jointly modeling spatial and temporal cues in skeleton sequences. We also use a Selection-Permutation strategy for skeleton joints to ensure more informative descriptions from skeletal data. Finally, we capitalize on Bootstrap Your Own Latent (BYOL) to learn robust representations from unlabeled skeleton sequence data. We achieved very competitive results on limited-size datasets: MCAD, IXMAS, JHMDB, and NW-UCLA, showing the effectiveness of our proposed method against state-of-the-art methods in terms of both performance and computational efficiency. To ensure reproducibility and reusability, the source code including all implementation parameters is provided at: https://github.com/SafwenNaimi/Representation-Learning-for-Skeleton-Action-Recognition-with-Convolutional-Transformers-and-BYOL

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes