CVJul 20, 2022

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

DeepMind
arXiv:2207.09644v361 citationsh-index: 104
Originality Incremental advance
AI Analysis

This addresses the difficulty of acquiring large-scale skeleton annotations for researchers in human motion analysis, offering an incremental improvement over existing contrastive learning methods.

The paper tackles the problem of self-supervised pre-training for human skeleton sequence representation by proposing a hierarchical scheme with a Transformer-based encoder to capture spatial and temporal dependencies at multiple levels, achieving state-of-the-art performance in action recognition, detection, and motion prediction tasks.

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult. Recent studies focus on learning video-level temporal and discriminative information using contrastive learning, but overlook the hierarchical spatial-temporal nature of human skeletons. Different from such superficial supervision at the video level, we propose a self-supervised hierarchical pre-training scheme incorporated into a hierarchical Transformer-based skeleton sequence encoder (Hi-TRS), to explicitly capture spatial, short-term, and long-term temporal dependencies at frame, clip, and video levels, respectively. To evaluate the proposed self-supervised pre-training scheme with Hi-TRS, we conduct extensive experiments covering three skeleton-based downstream tasks including action recognition, action detection, and motion prediction. Under both supervised and semi-supervised evaluation protocols, our method achieves the state-of-the-art performance. Additionally, we demonstrate that the prior knowledge learned by our model in the pre-training stage has strong transfer capability for different downstream tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes