Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation
This work addresses gesture recognition for home automation control using LIBRAS, but it is incremental as it applies existing methods to a specific domain with limited user diversity.
The paper tackles dynamic hand gesture recognition for LIBRAS sign language by combining MediaPipe Hand Landmarker for keypoint extraction with a CNN on spatiotemporal matrices, achieving 95% accuracy in low-light and 92% in normal lighting conditions.
This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the MediaPipe Hand Landmarker, responsible for extracting 21 skeletal keypoints of the hand, and a convolutional neural network (CNN) trained to classify gestures from a spatiotemporal matrix representation of dimensions 90 by 21 of those keypoints. The method is applied to the recognition of LIBRAS (Brazilian Sign Language) gestures for device control in a home automation system, covering 11 classes of static and dynamic gestures. For real-time inference, a sliding window with temporal frame triplication is used, enabling continuous recognition without recurrent networks. Tests achieved 95\% accuracy under low-light conditions and 92\% under normal lighting. The results indicate that the approach is effective, although systematic experiments with greater user diversity are needed for a more thorough evaluation of generalization.