CVApr 4, 2020

Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild

arXiv:2004.01946v1239 citations
AI Analysis

This work addresses 3D hand pose estimation in unconstrained environments, offering a practical solution for applications like human-computer interaction, though it is incremental as it builds on existing mesh-based methods.

The paper tackles monocular 3D hand pose estimation by introducing a network with an image encoder and mesh convolutional decoder, trained using a large-scale dataset from YouTube videos for weak supervision, resulting in state-of-the-art performance that halves errors on an in-the-wild benchmark.

We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. We train our network by gathering a large-scale dataset of hand action in YouTube videos and use it as a source of weak supervision. Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark. The dataset and additional resources are available at https://arielai.com/mesh_hands.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes