CVFeb 4, 2022

Bootstrapped Representation Learning for Skeleton-Based Action Recognition

arXiv:2202.02232v217 citations
AI Analysis

This work addresses the problem of improving action recognition accuracy for researchers and practitioners in computer vision, though it is incremental as it builds on existing BYOL methods.

The paper tackles self-supervised representation learning for 3D skeleton-based action recognition by extending BYOL with new data augmentations and multi-viewpoint sampling, achieving state-of-the-art performance on NTU-60 and NTU-120 datasets in linear evaluation and semi-supervised benchmarks.

In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semi-supervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60 and NTU-120 datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on both linear evaluation and semi-supervised benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes