CVHCJun 18, 2024

Deep self-supervised learning with visualisation for automatic gesture recognition

arXiv:2406.12440v1
Originality Synthesis-oriented
AI Analysis

This work addresses gesture recognition for human-machine interaction, but it is incremental as it applies existing methods to a specific domain.

The paper tackled automatic gesture recognition by exploring supervised learning, self-supervised learning, and visualization techniques on 3D skeleton data, finding that self-supervised learning increased accuracy in simulated settings and Grad-CAM confirmed models focused on relevant joints.

Gesture is an important mean of non-verbal communication, with visual modality allows human to convey information during interaction, facilitating peoples and human-machine interactions. However, it is considered difficult to automatically recognise gestures. In this work, we explore three different means to recognise hand signs using deep learning: supervised learning based methods, self-supervised methods and visualisation based techniques applied to 3D moving skeleton data. Self-supervised learning used to train fully connected, CNN and LSTM method. Then, reconstruction method is applied to unlabelled data in simulated settings using CNN as a backbone where we use the learnt features to perform the prediction in the remaining labelled data. Lastly, Grad-CAM is applied to discover the focus of the models. Our experiments results show that supervised learning method is capable to recognise gesture accurately, with self-supervised learning increasing the accuracy in simulated settings. Finally, Grad-CAM visualisation shows that indeed the models focus on relevant skeleton joints on the associated gesture.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes