CVNov 23, 2023

D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

Tsinghua
arXiv:2311.14189v44 citationsh-index: 25
Originality Highly original
AI Analysis

This work addresses a challenging computer vision problem with applications in robotics and AR/VR, representing a novel methodological approach rather than an incremental improvement.

The paper tackles the problem of reconstructing hand-held objects from single RGB images by introducing a dual-stream conditional diffusion model that addresses centroid deviation and hand-occlusion challenges, achieving state-of-the-art performance on synthetic and real-world datasets.

Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction (D-SCo), tackling two predominant challenges. First, to avoid the object centroid from deviating, we utilize a novel hand-constrained centroid fixing paradigm, enhancing the stability of diffusion and reverse processes and the precision of feature projection. Second, we introduce a dual-stream denoiser to semantically and geometrically model hand-object interactions with a novel unified hand-object semantic embedding, enhancing the reconstruction performance of the hand-occluded region of the object. Experiments on the synthetic ObMan dataset and three real-world datasets HO3D, MOW and DexYCB demonstrate that our approach can surpass all other state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes