Reconstructing Hand-Object Interactions in the Wild
This work is significant for researchers and practitioners in computer vision and robotics who need to analyze or synthesize realistic hand-object interactions in unconstrained environments, providing an incremental improvement in reconstruction quality.
This paper addresses the challenge of reconstructing 3D hand-object interactions from in-the-wild data, where 3D labels are scarce. The authors propose an optimization-based method that jointly optimizes hand and object poses using various 2D and 3D constraints, achieving compelling reconstructions on challenging datasets and comparing favorably to existing lab-setting approaches.
In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D reconstruction. Rather than optimizing the hand and object individually, we optimize them jointly which allows us to impose additional constraints based on hand-object contact, collision, and occlusion. Our method produces compelling reconstructions on the challenging in-the-wild data from the EPIC Kitchens and the 100 Days of Hands datasets, across a range of object categories. Quantitatively, we demonstrate that our approach compares favorably to existing approaches in the lab settings where ground truth 3D annotations are available.