CVAISep 20, 2021

Dynamic Gesture Recognition

arXiv:2109.09396v3
AI Analysis

This work addresses gesture recognition for human-machine interaction, but it is incremental as it applies known hybrid methods to a specific domain.

The paper tackled dynamic gesture recognition for Italian sign language by combining a CNN and RNN to process video frames, achieving improved prediction accuracy through temporal context and data augmentation.

The Human-Machine Interaction (HMI) research field is an important topic in machine learning that has been deeply investigated thanks to the rise of computing power in the last years. The first time, it is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms. The aim of this paper is to build a symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN) to recognize cultural/anthropological Italian sign language gestures from videos. The CNN extracts important features that later are used by the RNN. With RNNs we are able to store temporal information inside the model to provide contextual information from previous frames to enhance the prediction accuracy. Our novel approach uses different data augmentation techniques and regularization methods from only RGB frames to avoid overfitting and provide a small generalization error.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes