Training a Feedback Loop for Hand Pose Estimation
This addresses hand pose estimation for applications like human-computer interaction, but it is incremental as it builds on existing deep learning methods.
The paper tackles the problem of 3D hand pose estimation from depth images by introducing a data-driven feedback loop to correct errors from a Convolutional Neural Network, achieving state-of-the-art performance with an implementation running at over 400 fps on a single GPU.
We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. They remove the need for fitting a 3D model to the input data, which requires both a carefully designed fitting function and algorithm. We show that our approach outperforms state-of-the-art methods, and is efficient as our implementation runs at over 400 fps on a single GPU.