CVApr 14, 2022

What's in your hands? 3D Reconstruction of Generic Objects in Hands

arXiv:2204.07153v10.35114 citationsh-index: 95
AI Analysis55

This addresses the challenge of 3D reconstruction for arbitrary objects in human hands, which is incremental by building on prior work that assumed known templates.

The paper tackles the problem of reconstructing 3D shapes of generic hand-held objects from a single RGB image, without relying on known 3D templates, by leveraging hand articulation as a predictive cue, and demonstrates consistent outperformance over baselines across three datasets.

Our work aims to reconstruct hand-held objects given a single RGB image. In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates. Our key insight is that hand articulation is highly predictive of the object shape, and we propose an approach that conditionally reconstructs the object based on the articulation and the visual input. Given an image depicting a hand-held object, we first use off-the-shelf systems to estimate the underlying hand pose and then infer the object shape in a normalized hand-centric coordinate frame. We parameterized the object by signed distance which are inferred by an implicit network which leverages the information from both visual feature and articulation-aware coordinates to process a query point. We perform experiments across three datasets and show that our method consistently outperforms baselines and is able to reconstruct a diverse set of objects. We analyze the benefits and robustness of explicit articulation conditioning and also show that this allows the hand pose estimation to further improve in test-time optimization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes