CVDec 5, 2018

Point-to-Pose Voting based Hand Pose Estimation using Residual Permutation Equivariant Layer

arXiv:1812.02050v180 citations
Originality Incremental advance
AI Analysis

This work addresses hand pose estimation for applications like human-computer interaction, offering a novel approach that avoids memory-intensive or preprocessing-heavy methods, though it is incremental in improving existing techniques.

The paper tackles hand pose estimation from unordered 3D point clouds by proposing a method using residual permutation equivariant layers and a point-to-pose voting scheme, achieving state-of-the-art accuracy on the Hands2017Challenge dataset.

Recently, 3D input data based hand pose estimation methods have shown state-of-the-art performance, because 3D data capture more spatial information than the depth image. Whereas 3D voxel-based methods need a large amount of memory, PointNet based methods need tedious preprocessing steps such as K-nearest neighbour search for each point. In this paper, we present a novel deep learning hand pose estimation method for an unordered point cloud. Our method takes 1024 3D points as input and does not require additional information. We use Permutation Equivariant Layer (PEL) as the basic element, where a residual network version of PEL is proposed for the hand pose estimation task. Furthermore, we propose a voting based scheme to merge information from individual points to the final pose output. In addition to the pose estimation task, the voting-based scheme can also provide point cloud segmentation result without ground-truth for segmentation. We evaluate our method on both NYU dataset and the Hands2017Challenge dataset. Our method outperforms recent state-of-the-art methods, where our pose accuracy is currently the best for the Hands2017Challenge dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes