Action Recognition with Domain Invariant Features of Skeleton Image
This work addresses the domain shift problem in skeleton-based action recognition for computer vision applications, offering an incremental improvement over existing methods.
The paper tackles the problem of skeleton-based action recognition by addressing the loss of joint correlations in CNN methods, proposing a novel CNN-based approach with adversarial training to improve generalization across different view angles and subjects. The method achieves competitive results on the NTU RGB+D dataset, with accuracy gains of 2.4% for cross-subject and 1.9% for cross-view compared to the baseline.
Due to the fast processing-speed and robustness it can achieve, skeleton-based action recognition has recently received the attention of the computer vision community. The recent Convolutional Neural Network (CNN)-based methods have shown commendable performance in learning spatio-temporal representations for skeleton sequence, which use skeleton image as input to a CNN. Since the CNN-based methods mainly encoding the temporal and skeleton joints simply as rows and columns, respectively, the latent correlation related to all joints may be lost caused by the 2D convolution. To solve this problem, we propose a novel CNN-based method with adversarial training for action recognition. We introduce a two-level domain adversarial learning to align the features of skeleton images from different view angles or subjects, respectively, thus further improve the generalization. We evaluated our proposed method on NTU RGB+D. It achieves competitive results compared with state-of-the-art methods and 2.4$\%$, 1.9$\%$ accuracy gain than the baseline for cross-subject and cross-view.