CVAIMay 23, 2021

A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks

arXiv:2105.10902v15 citations
Originality Incremental advance
AI Analysis

This work addresses occlusion and depth ambiguities in hand pose estimation for augmented reality and human-computer interaction applications, representing an incremental improvement over existing GCN-based methods.

The paper tackles 3D hand pose estimation from single RGB images by proposing a two-stage GCN framework that learns per-pose relationship constraints, resulting in a model that outperforms state-of-the-art methods on two public datasets.

Hand pose estimation is a crucial part of a wide range of augmented reality and human-computer interaction applications. Predicting the 3D hand pose from a single RGB image is challenging due to occlusion and depth ambiguities. GCN-based (Graph Convolutional Networks) methods exploit the structural relationship similarity between graphs and hand joints to model kinematic dependencies between joints. These techniques use predefined or globally learned joint relationships, which may fail to capture pose-dependent constraints. To address this problem, we propose a two-stage GCN-based framework that learns per-pose relationship constraints. Specifically, the first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality. This spatial dependency information guides this phase to estimate reliable 2D and 3D poses. The second stage further improves the 3D estimation through a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships. Extensive experiments show that our multi-stage GCN approach yields an efficient model that produces accurate 2D/3D hand poses and outperforms the state-of-the-art on two public datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes