3D Hand Pose Detection in Egocentric RGB-D Images
This addresses the problem of reliable hand pose estimation in everyday activities from a first-person perspective for applications like human-computer interaction or robotics, though it is incremental as it builds on existing tracking-by-detection methods.
The paper tackles the problem of 3D hand pose estimation from egocentric viewpoints in RGB-D images, which is challenging due to occlusions and limited field-of-view, by using a discriminative tracking-by-detection framework with priors from synthetic data. It achieves state-of-the-art performance for hand detection and pose estimation on a real annotated dataset.
We focus on the task of everyday hand pose estimation from egocentric viewpoints. For this task, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved problem. The problem is considerably exacerbated when analyzing hands performing daily activities from a first-person viewpoint, due to severe occlusions arising from object manipulations and a limited field-of-view. Our system addresses these difficulties by exploiting strong priors over viewpoint and pose in a discriminative tracking-by-detection framework. Our priors are operationalized through a photorealistic synthetic model of egocentric scenes, which is used to generate training data for learning depth-based pose classifiers. We evaluate our approach on an annotated dataset of real egocentric object manipulation scenes and compare to both commercial and academic approaches. Our method provides state-of-the-art performance for both hand detection and pose estimation in egocentric RGB-D images.