CVApr 12, 2016

Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation

arXiv:1604.03334v2160 citations
AI Analysis

This work addresses the challenge of generating kinematically plausible hand poses in computer vision, which is incremental as it builds on existing hierarchical and hybrid approaches.

The paper tackles the problem of 3D hand pose estimation by proposing a hybrid method that applies a kinematic hierarchy strategy to both input and output spaces using a spatial attention mechanism and hierarchical PSO, resulting in significant outperformance over state-of-the-art methods and baselines on three public benchmarks.

Discriminative methods often generate hand poses kinematically implausible, then generative methods are used to correct (or verify) these results in a hybrid method. Estimating 3D hand pose in a hierarchy, where the high-dimensional output space is decomposed into smaller ones, has been shown effective. Existing hierarchical methods mainly focus on the decomposition of the output space while the input space remains almost the same along the hierarchy. In this paper, a hybrid hand pose estimation method is proposed by applying the kinematic hierarchy strategy to the input space (as well as the output space) of the discriminative method by a spatial attention mechanism and to the optimization of the generative method by hierarchical Particle Swarm Optimization (PSO). The spatial attention mechanism integrates cascaded and hierarchical regression into a CNN framework by transforming both the input(and feature space) and the output space, which greatly reduces the viewpoint and articulation variations. Between the levels in the hierarchy, the hierarchical PSO forces the kinematic constraints to the results of the CNNs. The experimental results show that our method significantly outperforms four state-of-the-art methods and three baselines on three public benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes