CVJan 18, 2023

HMDO: Markerless Multi-view Hand Manipulation Capture with Deformable Objects

arXiv:2301.07652v116 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the lack of datasets for hand-deformable object interactions, which is a domain-specific problem for computer vision and robotics researchers, though the method is incremental as it builds on existing capture and reconstruction techniques.

The authors tackled the challenge of reconstructing 3D hand and deformable object interactions by creating the first markerless multi-view dataset (HMDO) and proposing a method that uses hand features to guide object pose estimation and optimize deformation, achieving high-quality reconstructions across 21,600 frames.

We construct the first markerless deformable interaction dataset recording interactive motions of the hands and deformable objects, called HMDO (Hand Manipulation with Deformable Objects). With our built multi-view capture system, it captures the deformable interactions with multiple perspectives, various object shapes, and diverse interactive forms. Our motivation is the current lack of hand and deformable object interaction datasets, as 3D hand and deformable object reconstruction is challenging. Mainly due to mutual occlusion, the interaction area is difficult to observe, the visual features between the hand and the object are entangled, and the reconstruction of the interaction area deformation is difficult. To tackle this challenge, we propose a method to annotate our captured data. Our key idea is to collaborate with estimated hand features to guide the object global pose estimation, and then optimize the deformation process of the object by analyzing the relationship between the hand and the object. Through comprehensive evaluation, the proposed method can reconstruct interactive motions of hands and deformable objects with high quality. HMDO currently consists of 21600 frames over 12 sequences. In the future, this dataset could boost the research of learning-based reconstruction of deformable interaction scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes