CV HCJun 18, 2024

Deep self-supervised learning with visualisation for automatic gesture recognition

Fabien Allemand, Alessio Mazzela, Jun Villette, Decky Aspandi, Titus Zaharia

arXiv:2406.12440v12.0Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses gesture recognition for human-machine interaction, but it is incremental as it applies existing methods to a specific domain.

The paper tackled automatic gesture recognition by exploring supervised learning, self-supervised learning, and visualization techniques on 3D skeleton data, finding that self-supervised learning increased accuracy in simulated settings and Grad-CAM confirmed models focused on relevant joints.

Gesture is an important mean of non-verbal communication, with visual modality allows human to convey information during interaction, facilitating peoples and human-machine interactions. However, it is considered difficult to automatically recognise gestures. In this work, we explore three different means to recognise hand signs using deep learning: supervised learning based methods, self-supervised methods and visualisation based techniques applied to 3D moving skeleton data. Self-supervised learning used to train fully connected, CNN and LSTM method. Then, reconstruction method is applied to unlabelled data in simulated settings using CNN as a backbone where we use the learnt features to perform the prediction in the remaining labelled data. Lastly, Grad-CAM is applied to discover the focus of the models. Our experiments results show that supervised learning method is capable to recognise gesture accurately, with self-supervised learning increasing the accuracy in simulated settings. Finally, Grad-CAM visualisation shows that indeed the models focus on relevant skeleton joints on the associated gesture.

View on arXiv PDF Code

Similar