CVNov 14, 2022

Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object Detection without 3D Annotations

arXiv:2211.07108v31 citationsh-index: 23

Originality Highly original

AI Analysis

This addresses the limitation of heavy reliance on 3D annotations for real-world applications in 3D object detection, offering a novel approach that can also serve as a semi-automatic 3D annotator.

The paper tackles the problem of 3D object detection without requiring 3D annotations by proposing Recursive Cross-View (RCV), which converts 3D detection into multiple 2D detection tasks using a recursive paradigm; it outperforms existing image-based methods on SUN RGB-D and KITTI datasets and achieves 7 fps on a live RGB-D stream.

Heavily relying on 3D annotations limits the real-world application of 3D object detection. In this paper, we propose a method that does not demand any 3D annotation, while being able to predict fully oriented 3D bounding boxes. Our method, called Recursive Cross-View (RCV), utilizes the three-view principle to convert 3D detection into multiple 2D detection tasks, requiring only a subset of 2D labels. We propose a recursive paradigm, in which instance segmentation and 3D bounding box generation by Cross-View are implemented recursively until convergence. Specifically, our proposed method involves the use of a frustum for each 2D bounding box, which is then followed by the recursive paradigm that ultimately generates a fully oriented 3D box, along with its corresponding class and score. Note that, class and score are given by the 2D detector. Estimated on the SUN RGB-D and KITTI datasets, our method outperforms existing image-based approaches. To justify that our method can be quickly used to new tasks, we implement it on two real-world scenarios, namely 3D human detection and 3D hand detection. As a result, two new 3D annotated datasets are obtained, which means that RCV can be viewed as a (semi-) automatic 3D annotator. Furthermore, we deploy RCV on a depth sensor, which achieves detection at 7 fps on a live RGB-D stream. RCV is the first 3D detection method that yields fully oriented 3D boxes without consuming 3D labels.

View on arXiv PDF

Similar