CVNov 18, 2018

RGB-based 3D Hand Pose Estimation via Privileged Learning with Depth Images

arXiv:1811.07376v126 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurate hand pose estimation from RGB images for applications like human-computer interaction, though it is incremental by leveraging existing depth data.

The paper tackles 3D hand pose estimation from RGB images by using depth images as privileged information during training, resulting in improved performance that outperforms state-of-the-art methods on three public datasets.

This paper proposes a method for hand pose estimation from RGB images that uses both external large-scale depth image datasets and paired depth and RGB images as privileged information at training time. We show that providing depth information during training significantly improves performance of pose estimation from RGB images during testing. We explore different ways of using this privileged information: (1) using depth data to initially train a depth-based network, (2) using the features from the depth-based network of the paired depth images to constrain mid-level RGB network weights, and (3) using the foreground mask, obtained from the depth data, to suppress the responses from the background area. By using paired RGB and depth images, we are able to supervise the RGB-based network to learn middle layer features that mimic that of the corresponding depth-based network, which is trained on large-scale, accurately annotated depth data. During testing, when only an RGB image is available, our method produces accurate 3D hand pose predictions. Our method is also tested on 2D hand pose estimation. Experiments on three public datasets show that the method outperforms the state-of-the-art methods for hand pose estimation using RGB image input.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes