Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera
This enables marker-less, real-time tracking of two hands for applications like VR/AR, addressing a domain-specific problem with incremental improvements in handling interactions.
The paper tackles real-time pose and shape reconstruction of two interacting hands using a single depth camera, achieving state-of-the-art results in complex scenes like tight grasps and occlusions with real-time performance.
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands. Our approach is the first two-hand tracking solution that combines an extensive list of favorable properties, namely it is marker-less, uses a single consumer-level depth camera, runs in real time, handles inter- and intra-hand collisions, and automatically adjusts to the user's hand shape. In order to achieve this, we embed a recent parametric hand pose and shape model and a dense correspondence predictor based on a deep neural network into a suitable energy minimization framework. For training the correspondence prediction network, we synthesize a two-hand dataset based on physical simulations that includes both hand pose and shape annotations while at the same time avoiding inter-hand penetrations. To achieve real-time rates, we phrase the model fitting in terms of a nonlinear least-squares problem so that the energy can be optimized based on a highly efficient GPU-based Gauss-Newton optimizer. We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work, including tight two-hand grasps, significant inter-hand occlusions, and gesture interaction.