CVFeb 4, 2022

Bootstrapped Representation Learning for Skeleton-Based Action Recognition

Olivier Moliner, Sangxia Huang, Kalle Åström

arXiv:2202.02232v25.717 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving action recognition accuracy for researchers and practitioners in computer vision, though it is incremental as it builds on existing BYOL methods.

The paper tackles self-supervised representation learning for 3D skeleton-based action recognition by extending BYOL with new data augmentations and multi-viewpoint sampling, achieving state-of-the-art performance on NTU-60 and NTU-120 datasets in linear evaluation and semi-supervised benchmarks.

In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semi-supervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60 and NTU-120 datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on both linear evaluation and semi-supervised benchmarks.

View on arXiv PDF

Similar