CVDec 15, 2022

DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention

arXiv:2212.07849v136 citationsh-index: 68
Originality Highly original
AI Analysis

This addresses the problem of accurate and efficient 3D object detection for autonomous driving systems, representing an incremental advancement over existing methods.

The paper tackled 3D object detection from multi-view images for autonomous driving by proposing DETR4D, a Transformer-based framework with sparse attention and direct feature query, achieving state-of-the-art results on the nuScenes dataset with improved efficiency.

3D object detection with surround-view images is an essential task for autonomous driving. In this work, we propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images. We design a novel projective cross-attention mechanism for query-image interaction to address the limitations of existing methods in terms of geometric cue exploitation and information loss for cross-view objects. In addition, we introduce a heatmap generation technique that bridges 3D and 2D spaces efficiently via query initialization. Furthermore, unlike the common practice of fusing intermediate spatial features for temporal aggregation, we provide a new perspective by introducing a novel hybrid approach that performs cross-frame fusion over past object queries and image features, enabling efficient and robust modeling of temporal information. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and efficiency of the proposed DETR4D.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes