CVApr 3, 2024

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition

arXiv:2404.02624v11 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses action recognition from skeleton data, which is important for applications like human-computer interaction, but it appears incremental as it builds on existing GCN and attention methods.

The paper tackles skeleton-based action recognition by proposing a Multi-Scale Spatial-Temporal self-attention GCN (MSST-GCN) to improve modeling ability, achieving state-of-the-art results on several datasets.

Skeleton-based gesture recognition methods have achieved high success using Graph Convolutional Network (GCN). In addition, context-dependent adaptive topology as a neighborhood vertex information and attention mechanism leverages a model to better represent actions. In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN to effectively improve modeling ability to achieve state-of-the-art results on several datasets. We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node. These two are followed by multi-scale convolution network with dilations, which not only captures the long-range temporal dependencies of joints but also the long-range spatial dependencies (i.e., long-distance dependencies) of node temporal behaviors. They are combined into high-level spatial-temporal representations and output the predicted action with the softmax classifier.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes