CVROOct 19, 2023

FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects

Georgia Tech
arXiv:2310.12974v121 citationsh-index: 48
Originality Incremental advance
AI Analysis

It addresses inefficiencies in self-supervised 3D recognition for robotics or AR/VR applications, though it is incremental as it builds on existing methods with a multi-stage training pipeline.

The paper tackles 3D object recognition from a single RGB-D image without real-world 3D labels, achieving a 16.4% absolute improvement in mAP for 6D pose estimation on the NOCS test-set and running at 5 Hz.

In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data. Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference. While existing self-supervised methods have made strides in this field, they often suffer from inefficiencies arising from non-end-to-end processing, reliance on separate models for different object categories, and slow surface extraction during the training of implicit reconstruction models; thus hindering both the speed and real-world applicability of the 3D recognition process. Our proposed method leverages a multi-stage training pipeline, designed to efficiently transfer synthetic performance to the real-world domain. This approach is achieved through a combination of 2D and 3D supervised losses during the synthetic domain training, followed by the incorporation of 2D supervised and 3D self-supervised losses on real-world data in two additional learning stages. By adopting this comprehensive strategy, our method successfully overcomes the aforementioned limitations and outperforms existing self-supervised 6D pose and size estimation baselines on the NOCS test-set with a 16.4% absolute improvement in mAP for 6D pose estimation while running in near real-time at 5 Hz.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes