CVJun 15, 2023

CAD-Estate: Large-scale CAD Model Annotation in RGB Videos

arXiv:2306.09011v214 citationsh-index: 86Has Code
Originality Incremental advance
AI Analysis

This provides a dataset for 3D object reconstruction and pose estimation in computer vision, enabling crowd-sourced annotation and pre-training improvements, though it is incremental in scaling up existing annotation methods.

The paper tackles the problem of annotating RGB videos with globally-consistent 3D CAD models for objects, resulting in a large-scale dataset with 101k instances and 12k unique models, which is 7x more instances and 4x more unique models than the largest existing dataset.

We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects. We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation. Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor. Many steps are performed automatically, and the tasks performed by humans are simple, well-specified, and require only limited reasoning in 3D. This makes them feasible for crowd-sourcing and has allowed us to construct a large-scale dataset by annotating real-estate videos from YouTube. Our dataset CAD-Estate offers 101k instances of 12k unique CAD models placed in the 3D representations of 20k videos. In comparison to Scan2CAD, the largest existing dataset with CAD model annotations on real scenes, CAD-Estate has 7x more instances and 4x more unique CAD models. We showcase the benefits of pre-training a Mask2CAD model on CAD-Estate for the task of automatic 3D object reconstruction and pose estimation, demonstrating that it leads to performance improvements on the popular Scan2CAD benchmark. The dataset is available at https://github.com/google-research/cad-estate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes