GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition
This work addresses the problem of temporal modeling in skeleton-based action recognition for computer vision applications, representing an incremental improvement over existing GCN-based methods.
The paper tackled the challenge of capturing temporal dynamics in skeleton-based action recognition by proposing a G-Dev layer based on path development and integrating it into a hybrid G-DevLSTM module, achieving state-of-the-art results on datasets like NTU60, NTU120, and Chalearn2013 with improved robustness.
Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision. The recent state-of-the-art (SOTA) models for SAR are primarily based on graph convolutional neural networks (GCNs), which are powerful in extracting the spatial information of skeleton data. However, it is yet clear that such GCN-based models can effectively capture the temporal dynamics of human action sequences. To this end, we propose the G-Dev layer, which exploits the path development -- a principled and parsimonious representation for sequential data by leveraging the Lie group structure. By integrating the G-Dev layer, the hybrid G-DevLSTM module enhances the traditional LSTM to reduce the time dimension while retaining high-frequency information. It can be conveniently applied to any temporal graph data, complementing existing advanced GCN-based models. Our empirical studies on the NTU60, NTU120 and Chalearn2013 datasets demonstrate that our proposed GCN-DevLSTM network consistently improves the strong GCN baseline models and achieves SOTA results with superior robustness in SAR tasks. The code is available at https://github.com/DeepIntoStreams/GCN-DevLSTM.