CVApr 30, 2024

Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition

Zhendong Liu, Haifeng Xia, Tong Guo, Libo Sun, Ming Shao, Siyu Xia

arXiv:2404.19383v13.74 citationsh-index: 4FG

Originality Incremental advance

AI Analysis

This work addresses the challenge of fine-grained action recognition in sports, which is important for applications like posture correction, but it is incremental as it builds on existing GCN methods.

The paper tackled the problem of capturing fine-grained action changes in skeleton-based sports action recognition by proposing a Cross-block Fine-grained Semantic Cascade (CFSC) module, which improved performance on benchmarks like FSD-10 and a new dataset FD-7.

Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purposes. Although the global features learned through the procedure are suitable for the general classification, they have difficulty capturing fine-grained action change across adjacent frames -- decisive factors in sports actions. In this paper, we propose a novel ``Cross-block Fine-grained Semantic Cascade (CFSC)'' module to overcome this challenge. In summary, the proposed CFSC progressively integrates shallow visual knowledge into high-level blocks to allow networks to focus on action details. In particular, the CFSC module utilizes the GCN feature maps produced at different levels, as well as aggregated features from proceeding levels to consolidate fine-grained features. In addition, a dedicated temporal convolution is applied at each level to learn short-term temporal features, which will be carried over from shallow to deep layers to maximize the leverage of low-level details. This cross-block feature aggregation methodology, capable of mitigating the loss of fine-grained information, has resulted in improved performance. Last, FD-7, a new action recognition dataset for fencing sports, was collected and will be made publicly available. Experimental results and empirical analysis on public benchmarks (FSD-10) and self-collected (FD-7) demonstrate the advantage of our CFSC module on learning discriminative patterns for action classification over others.

View on arXiv PDF

Similar