CVDec 15, 2022

DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention

Zhipeng Luo, Changqing Zhou, Gongjie Zhang, Shijian Lu

arXiv:2212.07849v116.036 citationsh-index: 68

Originality Highly original

AI Analysis

This addresses the problem of accurate and efficient 3D object detection for autonomous driving systems, representing an incremental advancement over existing methods.

The paper tackled 3D object detection from multi-view images for autonomous driving by proposing DETR4D, a Transformer-based framework with sparse attention and direct feature query, achieving state-of-the-art results on the nuScenes dataset with improved efficiency.

3D object detection with surround-view images is an essential task for autonomous driving. In this work, we propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images. We design a novel projective cross-attention mechanism for query-image interaction to address the limitations of existing methods in terms of geometric cue exploitation and information loss for cross-view objects. In addition, we introduce a heatmap generation technique that bridges 3D and 2D spaces efficiently via query initialization. Furthermore, unlike the common practice of fusing intermediate spatial features for temporal aggregation, we provide a new perspective by introducing a novel hybrid approach that performs cross-frame fusion over past object queries and image features, enabling efficient and robust modeling of temporal information. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and efficiency of the proposed DETR4D.

View on arXiv PDF

Similar