CVHCJun 21, 2024

Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

arXiv:2406.15003v29 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient real-time gesture recognition in applications like VR/AR and assistive technologies, though it appears incremental as it builds on existing methods with optimizations.

The study tackled real-time hand gesture recognition by introducing a skeleton-based framework that converts dynamic gestures into static images, achieving competitive performance on five benchmark datasets with low latency on consumer hardware.

Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts. However, existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications. This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image classification task, effectively reducing both hardware and computational demands. Our framework utilizes a data-level fusion technique to encode 3D skeleton data from dynamic gestures into static RGB spatiotemporal images. It incorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNN architecture that optimizes the semantic connections between data representations while minimizing computational needs. Tested across five benchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the framework showed competitive performance with the state-of-the-art. Its capability to support real-time HGR applications was also demonstrated through deployment on standard consumer PC hardware, showcasing low latency and minimal resource usage in real-world settings. The successful deployment of this framework underscores its potential to enhance real-time applications in fields such as virtual/augmented reality, ambient intelligence, and assistive technologies, providing a scalable and efficient solution for dynamic gesture recognition.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes