ROCA: Robust CAD Model Retrieval and Alignment from a Single Image
This enables lightweight 3D perception from 2D images for applications like robotics or AR, but it is incremental as it builds on existing retrieval and alignment methods.
The paper tackles the problem of retrieving and aligning 3D CAD models from a single image, achieving a significant improvement in retrieval-aware CAD alignment accuracy from 9.5% to 17.6% on ScanNet imagery.
We present ROCA, a novel end-to-end approach that retrieves and aligns 3D CAD models from a shape database to a single input image. This enables 3D perception of an observed scene from a 2D RGB observation, characterized as a lightweight, compact, clean CAD representation. Core to our approach is our differentiable alignment optimization based on dense 2D-3D object correspondences and Procrustes alignment. ROCA can thus provide a robust CAD alignment while simultaneously informing CAD retrieval by leveraging the 2D-3D correspondences to learn geometrically similar CAD models. Experiments on challenging, real-world imagery from ScanNet show that ROCA significantly improves on state of the art, from 9.5% to 17.6% in retrieval-aware CAD alignment accuracy.