Spatio-Temporal Dual Affine Differential Invariant for Skeleton-based Action Recognition
This work addresses action recognition for applications like surveillance or human-computer interaction, but it appears incremental as it builds on existing skeleton-based methods with a new feature and channel augmentation.
The paper tackled the problem of action recognition from skeleton data by proposing a novel feature called spatio-temporal dual affine differential invariant (STDADI) to handle distortions modeled as spatial and temporal affine transformations, achieving remarkable improvements over previous state-of-the-art methods on the NTU-RGB+D and NTU-RGB+D 120 datasets.
The dynamics of human skeletons have significant information for the task of action recognition. The similarity between trajectories of corresponding joints is an indicating feature of the same action, while this similarity may subject to some distortions that can be modeled as the combination of spatial and temporal affine transformations. In this work, we propose a novel feature called spatio-temporal dual affine differential invariant (STDADI). Furthermore, in order to improve the generalization ability of neural networks, a channel augmentation method is proposed. On the large scale action recognition dataset NTU-RGB+D, and its extended version NTU-RGB+D 120, it achieves remarkable improvements over previous state-of-the-art methods.