CVAug 23, 2022

DeepInteraction: 3D Object Detection via Modality Interaction

Zeyu Yang, Jiaqi Chen, Zhenwei Miao, Wei Li, Xiatian Zhu, Li Zhang

arXiv:2208.11112v431.2224 citationsh-index: 16Has Code

Originality Highly original

AI Analysis

This work addresses a fundamental bottleneck in 3D object detection for autonomous driving by proposing a novel interaction strategy, showing significant performance gains.

The paper tackles the limitation of existing multi-modal fusion strategies in 3D object detection by introducing a modality interaction strategy that learns and maintains individual per-modality representations to exploit their unique characteristics, resulting in a method that surpasses prior arts by a large margin and ranks first on the nuScenes leaderboard.

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.

View on arXiv PDF Code

Similar