Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
This addresses the challenge of 3D shape perception from images for applications like semantic scene understanding, though it is an incremental improvement over existing retrieval methods.
The paper tackles the problem of retrieving 3D CAD models from a single image in real-world scenarios by proposing a patchwise embedding method that establishes correspondences between image patches and CAD geometry patches, enabling robust shape retrieval without exact database matches. Experiments on ScanNet show improved robustness over state-of-the-art methods in in-the-wild imagery.
3D perception of object shapes from RGB image input is fundamental towards semantic scene understanding, grounding image-based perception in our spatially 3-dimensional real-world environments. To achieve a mapping between image views of objects and 3D shapes, we leverage CAD model priors from existing large-scale databases, and propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion -- establishing correspondences between patches of an image view of an object and patches of CAD geometry. This enables part similarity reasoning for retrieving similar CADs to a new image view without exact matches in the database. Our patch embedding provides more robust CAD retrieval for shape estimation in our end-to-end estimation of CAD model shape and pose for detected objects in a single input image. Experiments on in-the-wild, complex imagery from ScanNet show that our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.