CVJul 15, 2024

OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

arXiv:2407.10753v117 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting 3D objects from multiple camera views for autonomous driving systems, representing an incremental improvement over existing methods.

The paper tackles the problem of inaccurate depth information in multi-view 3D object detection by proposing OPEN, which uses object-wise depth estimation to improve detection accuracy, achieving 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.

Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D detectors due to the lack of the depth of 3D object center; 2) for distant objects, fine-grained depth estimation of the whole object is more challenging. Therefore, we argue that the object-wise depth (or 3D center of the object) is essential for accurate detection. In this paper, we propose a new multi-view 3D object detector named OPEN, whose main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding. Specifically, we first employ an object-wise depth encoder, which takes the pixel-wise depth map as a prior, to accurately estimate the object-wise depth. Then, we utilize the proposed object-wise position embedding to encode the object-wise depth information into the transformer decoder, thereby producing 3D object-aware features for final detection. Extensive experiments verify the effectiveness of our proposed method. Furthermore, OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes