CVAug 15, 2024

Single-image coherent reconstruction of objects and humans

arXiv:2408.08086v1h-index: 4
AI Analysis

This addresses the challenge of obtaining globally consistent 3D reconstructions from monocular images for real-world scenarios, particularly for scenes with multiple interacting humans and objects, representing a strong specific gain in this domain.

The paper tackles the problem of reconstructing 3D scenes with interacting objects and humans from a single image, which often suffers from mesh collisions and performance issues with occlusions, and demonstrates a significant reduction in collisions and more coherent reconstructions.

Existing methods for reconstructing objects and humans from a monocular image suffer from severe mesh collisions and performance limitations for interacting occluding objects. This paper introduces a method to obtain a globally consistent 3D reconstruction of interacting objects and people from a single image. Our contributions include: 1) an optimization framework, featuring a collision loss, tailored to handle human-object and human-human interactions, ensuring spatially coherent scene reconstruction; and 2) a novel technique to robustly estimate 6 degrees of freedom (DOF) poses, specifically for heavily occluded objects, exploiting image inpainting. Notably, our proposed method operates effectively on images from real-world scenarios, without necessitating scene or object-level 3D supervision. Extensive qualitative and quantitative evaluation against existing methods demonstrates a significant reduction in collisions in the final reconstructions of scenes with multiple interacting humans and objects and a more coherent scene reconstruction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes