CVDec 11, 2020

EventHands: Real-Time Neural 3D Hand Pose Estimation from an Event Stream

arXiv:2012.06475v368 citations
AI Analysis

This work addresses the problem of high-speed 3D hand pose estimation for applications requiring low latency and high temporal resolution, such as human-computer interaction or robotics, by leveraging event cameras.

This paper introduces EventHands, a novel neural approach for real-time 3D hand pose estimation using a single event camera. It achieves real-time performance at 1000 Hz and outperforms recent monocular methods in accuracy and ability to capture high-speed hand motions.

3D hand pose estimation from monocular videos is a long-standing and challenging problem, which is now seeing a strong upturn. In this work, we address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes. Our EventHands approach has characteristics previously not demonstrated with a single RGB or depth camera such as high temporal resolution at low data throughputs and real-time performance at 1000 Hz. Due to the different data modality of event cameras compared to classical cameras, existing methods cannot be directly applied to and re-trained for event streams. We thus design a new neural approach which accepts a new event stream representation suitable for learning, which is trained on newly-generated synthetic event streams and can generalise to real data. Experiments show that EventHands outperforms recent monocular methods using a colour (or depth) camera in terms of accuracy and its ability to capture hand motions of unprecedented speed. Our method, the event stream simulator and the dataset are publicly available; see https://4dqv.mpi-inf.mpg.de/EventHands/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes