CVNov 29, 2017

Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

arXiv:1711.10872v215.960 citations
Originality Incremental advance
AI Analysis

It addresses the challenge of accurate hand pose estimation for applications like VR/AR, especially in egocentric views with occlusions, though it is an incremental advance over existing CNN-based methods.

The paper tackles the problem of 3D hand pose estimation from depth images, particularly under self-occlusion, by proposing a hierarchical mixture density network (HMDN) that models multiple pose modes, resulting in significant performance improvements on occlusion benchmarks and comparable results on non-occlusion benchmarks.

Learning and predicting the pose parameters of a 3D hand model given an image, such as locations of hand joints, is challenging due to large viewpoint changes and articulations, and severe self-occlusions exhibited particularly in egocentric views. Both feature learning and prediction modeling have been investigated to tackle the problem. Though effective, most existing discriminative methods yield a single deterministic estimation of target poses. Due to their single-value mapping intrinsic, they fail to adequately handle self-occlusion problems, where occluded joints present multiple modes. In this paper, we tackle the self-occlusion issue and provide a complete description of observed poses given an input depth image by a novel method called hierarchical mixture density networks (HMDN). The proposed method leverages the state-of-the-art hand pose estimators based on Convolutional Neural Networks to facilitate feature learning, while it models the multiple modes in a two-level hierarchy to reconcile single-valued and multi-valued mapping in its output. The whole framework with a mixture of two differentiable density functions is naturally end-to-end trainable. In the experiments, HMDN produces interpretable and diverse candidate samples, and significantly outperforms the state-of-the-art methods on two benchmarks with occlusions, and performs comparably on another benchmark free of occlusions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes