Temporal-Channel Topology Enhanced Network for Skeleton-Based Action Recognition
This addresses the challenge of long-distance correlation modeling in skeleton-based action recognition, offering a novel CNN approach that outperforms existing methods, though it is incremental in improving topology modeling.
The paper tackles the problem of skeleton-based action recognition by proposing TCTE-Net, a CNN architecture that learns spatial and temporal topologies, achieving state-of-the-art performance on datasets like NTU RGB+D and FineGym compared to CNN-based and GCN-based methods.
Skeleton-based action recognition has become popular in recent years due to its efficiency and robustness. Most current methods adopt graph convolutional network (GCN) for topology modeling, but GCN-based methods are limited in long-distance correlation modeling and generalizability. In contrast, the potential of convolutional neural network (CNN) for topology modeling has not been fully explored. In this paper, we propose a novel CNN architecture, Temporal-Channel Topology Enhanced Network (TCTE-Net), to learn spatial and temporal topologies for skeleton-based action recognition. The TCTE-Net consists of two modules: the Temporal-Channel Focus module, which learns a temporal-channel focus matrix to identify the most critical feature representations, and the Dynamic Channel Topology Attention module, which dynamically learns spatial topological features, and fuses them with an attention mechanism to model long-distance channel-wise topology. We conduct experiments on NTU RGB+D, NTU RGB+D 120, and FineGym datasets. TCTE-Net shows state-of-the-art performance compared to CNN-based methods and achieves superior performance compared to GCN-based methods. The code is available at https://github.com/aikuniverse/TCTE-Net.