Towards unconstrained joint hand-object reconstruction from RGB videos
This work addresses the need for unconstrained hand-object reconstruction in robotics and human demonstration learning, though it is incremental as it builds on existing cues.
The paper tackles the problem of 3D reconstruction of hands and manipulated objects from monocular videos, proposing a learning-free fitting approach that handles two-hand interactions and is applicable to datasets without training data.
Our work aims to obtain 3D reconstruction of hands and manipulated objects from monocular videos. Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations. The supervised learning approach to this problem, however, requires 3D supervision and remains limited to constrained laboratory settings and simulators for which 3D ground truth is available. In this paper we first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions. Our method relies on cues obtained with common methods for object detection, hand pose estimation and instance segmentation. We quantitatively evaluate our approach and show that it can be applied to datasets with varying levels of difficulty for which training data is unavailable.