CVOct 6, 2018

Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images

Yiming Wu, Wei Ji, Xi Li, Gang Wang, Jianwei Yin, Fei Wu

arXiv:1810.02994v10.9

Originality Incremental advance

AI Analysis

This addresses the problem of accurate and real-time hand pose estimation for applications in human-computer interaction and robotics, representing an incremental improvement over existing methods.

The paper tackles hand pose estimation from depth images by proposing a Context-Aware Deep Spatio-Temporal Network (CADSTN) that jointly models spatio-temporal properties, achieving state-of-the-art or second-best performance on two benchmarks and running at 60fps.

As a fundamental and challenging problem in computer vision, hand pose estimation aims to estimate the hand joint locations from depth images. Typically, the problem is modeled as learning a mapping function from images to hand joint coordinates in a data-driven manner. In this paper, we propose Context-Aware Deep Spatio-Temporal Network (CADSTN), a novel method to jointly model the spatio-temporal properties for hand pose estimation. Our proposed network is able to learn the representations of the spatial information and the temporal structure from the image sequences. Moreover, by adopting adaptive fusion method, the model is capable of dynamically weighting different predictions to lay emphasis on sufficient context. Our method is examined on two common benchmarks, the experimental results demonstrate that our proposed approach achieves the best or the second-best performance with state-of-the-art methods and runs in 60fps.

View on arXiv PDF

Similar