Improving Distant 3D Object Detection Using 2D Box Supervision
This addresses the challenge of limited 3D annotations for distant objects in autonomous driving or robotics, offering a practical solution to extend detection range without costly LiDAR data, though it is incremental as it builds on existing 3D detection methods.
The paper tackles the problem of detecting distant 3D objects in camera-based perception by using only 2D box supervision, which is easier to annotate than 3D boxes that rely on sparse LiDAR data. It proposes LR3D, a framework that learns to estimate depth from 2D boxes using supervision from close objects, enabling detection of objects over 200m with accuracy comparable to full 3D supervision.
Improving the detection of distant 3d objects is an important yet challenging task. For camera-based 3D perception, the annotation of 3d bounding relies heavily on LiDAR for accurate depth information. As such, the distance of annotation is often limited due to the sparsity of LiDAR points on distant objects, which hampers the capability of existing detectors for long-range scenarios. We address this challenge by considering only 2D box supervision for distant objects since they are easy to annotate. We propose LR3D, a framework that learns to recover the missing depth of distant objects. LR3D adopts an implicit projection head to learn the generation of mapping between 2D boxes and depth using the 3D supervision on close objects. This mapping allows the depth estimation of distant objects conditioned on their 2D boxes, making long-range 3D detection with 2D supervision feasible. Experiments show that without distant 3D annotations, LR3D allows camera-based methods to detect distant objects (over 200m) with comparable accuracy to full 3D supervision. Our framework is general, and could widely benefit 3D detection methods to a large extent.