CV RO IV OPTICSMar 16, 2025

EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera

Luming Wang, Hao Shi, Xiaoting Yin, Kailun Yang, Kaiwei Wang, Jian Bai

arXiv:2503.12419v34 citationsh-index: 39Has CodeSMC

Originality Incremental advance

AI Analysis

This addresses gesture recognition for human-computer interaction in dynamic scenarios, though it appears incremental with specific architectural improvements.

The paper tackles egocentric gesture recognition by developing a novel network architecture for event camera data, achieving 62.7% accuracy on unseen subjects (3.1% higher than SOTA) with only 7M parameters and 97.0% accuracy on a public dataset.

Egocentric gesture recognition is a pivotal technology for enhancing natural human-computer interaction, yet traditional RGB-based solutions suffer from motion blur and illumination variations in dynamic scenarios. While event cameras show distinct advantages in handling high dynamic range with ultra-low power consumption, existing RGB-based architectures face inherent limitations in processing asynchronous event streams due to their synchronous frame-based nature. Moreover, from an egocentric perspective, event cameras record data that includes events generated by both head movements and hand gestures, thereby increasing the complexity of gesture recognition. To address this, we propose a novel network architecture specifically designed for event data processing, incorporating (1) a lightweight CNN with asymmetric depthwise convolutions to reduce parameters while preserving spatiotemporal features, (2) a plug-and-play state-space model as context block that decouples head movement noise from gesture dynamics, and (3) a parameter-free Bins-Temporal Shift Module (BTSM) that shifts features along bins and temporal dimensions to fuse sparse events efficiently. We further establish the EgoEvGesture dataset, the first large-scale dataset for egocentric gesture recognition using event cameras. Experimental results demonstrate that our method achieves 62.7% accuracy tested on unseen subjects with only 7M parameters, 3.1% higher than state-of-the-art approaches. Notable misclassifications in freestyle motions stem from high inter-personal variability and unseen test patterns differing from training data. Moreover, our approach achieved a remarkable accuracy of 97.0% on the DVS128 Gesture, demonstrating the effectiveness and generalization capability of our method on public datasets. The dataset and models are made available at https://github.com/3190105222/EgoEv_Gesture.

View on arXiv PDF Code

Similar