CVJul 23, 2017

Towards Good Practices for Deep 3D Hand Pose Estimation

arXiv:1707.07248v184 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurate 3D hand pose estimation for human-computer interaction, representing an incremental improvement over existing deep learning methods.

The paper tackled 3D hand pose estimation from single depth images by proposing a tree-structured Region Ensemble Network (REN) for direct coordinate regression, achieving state-of-the-art performance on three public datasets and also excelling in fingertip detection and human pose tasks.

3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional random forest based methods is not so apparent. To exploit the good practice and promote the performance for hand pose estimation, we propose a tree-structured Region Ensemble Network (REN) for directly 3D coordinate regression. It first partitions the last convolution outputs of ConvNet into several grid regions. The results from separate fully-connected (FC) regressors on each regions are then integrated by another FC layer to perform the estimation. By exploitation of several training strategies including data augmentation and smooth $L_1$ loss, proposed REN can significantly improve the performance of ConvNet to localize hand joints. The experimental results demonstrate that our approach achieves the best performance among state-of-the-art algorithms on three public hand pose datasets. We also experiment our methods on fingertip detection and human pose datasets and obtain state-of-the-art accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes