CVDec 6, 2020

DGGAN: Depth-image Guided Generative Adversarial Networks for Disentangling RGB and Depth Images in 3D Hand Pose Estimation

Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Wei Fan, Xiaohui Xie

arXiv:2012.03197v13.329 citations

Originality Highly original

AI Analysis

This work addresses the challenge of 3D hand pose estimation for computer vision applications by eliminating the need for ground-truth depth maps during training, which simplifies data requirements for researchers and developers.

This paper introduces DGGAN, a conditional GAN that generates realistic depth maps from RGB images. These synthesized depth maps are then used to regularize 3D hand pose estimation models, achieving state-of-the-art results with a reduction in mean 3D end-point errors by 4.7% on RHD, 16.5% on STB, and 6.8% on MHP datasets.

Estimating3D hand poses from RGB images is essentialto a wide range of potential applications, but is challengingowing to substantial ambiguity in the inference of depth in-formation from RGB images. State-of-the-art estimators ad-dress this problem by regularizing3D hand pose estimationmodels during training to enforce the consistency betweenthe predicted3D poses and the ground-truth depth maps.However, these estimators rely on both RGB images and thepaired depth maps during training. In this study, we proposea conditional generative adversarial network (GAN) model,called Depth-image Guided GAN (DGGAN), to generate re-alistic depth maps conditioned on the input RGB image, anduse the synthesized depth maps to regularize the3D handpose estimation model, therefore eliminating the need forground-truth depth maps. Experimental results on multiplebenchmark datasets show that the synthesized depth mapsproduced by DGGAN are quite effective in regularizing thepose estimation model, yielding new state-of-the-art resultsin estimation accuracy, notably reducing the mean3D end-point errors (EPE) by4.7%,16.5%, and6.8%on the RHD,STB and MHP datasets, respectively.

View on arXiv PDF

Similar