CVMar 22, 2021

Model-based 3D Hand Reconstruction via Self-Supervised Learning

arXiv:2103.11703v1126 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of costly 3D data annotation for hand reconstruction, offering a more accessible solution for applications in VR/AR or robotics, though it is incremental as it builds on existing self-supervised ideas.

The paper tackles the problem of reconstructing a 3D hand from a single RGB image without expensive 3D annotations by proposing S2HAND, a self-supervised network that estimates pose, shape, texture, and viewpoint, achieving comparable performance to fully-supervised methods with less supervision.

Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. Specifically, we obtain geometric cues from the input image through easily accessible 2D detected keypoints. To learn an accurate hand reconstruction model from these noisy geometric cues, we utilize the consistency between 2D and 3D representations and propose a set of novel losses to rationalize outputs of the neural network. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations. Our experiments show that the proposed method achieves comparable performance with recent fully-supervised methods while using fewer supervision data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes