Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding
This work addresses the challenge of costly manual annotations for 3D scene understanding, offering a method to reduce costs while improving model performance, though it is incremental as it applies an existing pipeline to a new dataset.
The paper tackled the problem of generating accurate 3D annotations for training deep learning models in 3D scene understanding by using automatic retrieval of synthetic CAD models as high-quality ground truth, and found that models trained on these annotations outperformed those trained on manually annotated data in tasks like point cloud completion and CAD model retrieval.
High-level 3D scene understanding is essential in many applications. However, the challenges of generating accurate 3D annotations make development of deep learning models difficult. We turn to recent advancements in automatic retrieval of synthetic CAD models, and show that data generated by such methods can be used as high-quality ground truth for training supervised deep learning models. More exactly, we employ a pipeline akin to the one previously used to automatically annotate objects in ScanNet scenes with their 9D poses and CAD models. This time, we apply it to the recent ScanNet++ v1 dataset, which previously lacked such annotations. Our findings demonstrate that it is not only possible to train deep learning models on these automatically-obtained annotations but that the resulting models outperform those trained on manually annotated data. We validate this on two distinct tasks: point cloud completion and single-view CAD model retrieval and alignment. Our results underscore the potential of automatic 3D annotations to enhance model performance while significantly reducing annotation costs. To support future research in 3D scene understanding, we will release our annotations, which we call SCANnotate++, along with our trained models.