CVAIFeb 16, 2023

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

arXiv:2302.08231v38 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in multi-camera 3D perception for autonomous driving, offering an incremental improvement over existing methods.

The paper tackles the problem of limited multi-view feature integration in 3D object detection for autonomous driving by proposing 3M3D, which updates both multi-view and query features using self-attention and multi-representation queries, resulting in performance improvements on the nuScenes benchmark dataset.

3D visual perception tasks based on multi-camera images are essential for autonomous driving systems. Latest work in this field performs 3D object detection by leveraging multi-view images as an input and iteratively enhancing object queries (object proposals) by cross-attending multi-view features. However, individual backbone features are not updated with multi-view features and it stays as a mere collection of the output of the single-image backbone network. Therefore we propose 3M3D: A Multi-view, Multi-path, Multi-representation for 3D Object Detection where we update both multi-view features and query features to enhance the representation of the scene in both fine panoramic view and coarse global view. Firstly, we update multi-view features by multi-view axis self-attention. It will incorporate panoramic information in the multi-view features and enhance understanding of the global scene. Secondly, we update multi-view features by self-attention of the ROI (Region of Interest) windows which encodes local finer details in the features. It will help exchange the information not only along the multi-view axis but also along the other spatial dimension. Lastly, we leverage the fact of multi-representation of queries in different domains to further boost the performance. Here we use sparse floating queries along with dense BEV (Bird's Eye View) queries, which are later post-processed to filter duplicate detections. Moreover, we show performance improvements on nuScenes benchmark dataset on top of our baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes