CVAIJul 13, 2022

Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning

arXiv:2207.06101v176 citationsh-index: 32Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of learning complex human actions from skeleton data without labels, which is important for applications like robotics and surveillance, but it appears incremental as it builds on existing transformer-based approaches.

The paper tackles the problem of unsupervised learning of skeleton motion sequences by addressing limitations in capturing global motion, long-range temporal dynamics, and person-to-person interactions, resulting in a model that outperforms state-of-the-art models by notable margins on representative benchmarks.

We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences. The existing transformer model utilized for unsupervised skeleton-based action learning is learned the instantaneous velocity of each joint from adjacent frames without global motion information. Thus, the model has difficulties in learning the attention globally over whole-body motions and temporally distant joints. In addition, person-to-person interactions have not been considered in the model. To tackle the learning of whole-body motion, long-range temporal dynamics, and person-to-person interactions, we design a global and local attention mechanism, where, global body motions and local joint motions pay attention to each other. In addition, we propose a novel pretraining strategy, multi-interval pose displacement prediction, to learn both global and local attention in diverse time ranges. The proposed model successfully learns local dynamics of the joints and captures global context from the motion sequences. Our model outperforms state-of-the-art models by notable margins in the representative benchmarks. Codes are available at https://github.com/Boeun-Kim/GL-Transformer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes