CVMar 18, 2021

Suppress-and-Refine Framework for End-to-End 3D Object Detection

Zili Liu, Guodong Xu, Honghui Yang, Minghao Chen, Kuoliang Wu, Zheng Yang, Haifeng Liu, Deng Cai

arXiv:2103.10042v23.74 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses inefficiencies in 3D object detection for applications like robotics and autonomous systems, though it is incremental as it builds on VoteNet.

The paper tackles the problem of non-end-to-end and time-consuming 3D object detection by proposing a suppress-and-refine framework that removes handcrafted components, achieving state-of-the-art performance on ScanNetV2 and SUN RGB-D datasets with the fastest speed ever.

3D object detector based on Hough voting achieves great success and derives many follow-up works. Despite constantly refreshing the detection accuracy, these works suffer from handcrafted components used to eliminate redundant boxes, and thus are non-end-to-end and time-consuming. In this work, we propose a suppress-and-refine framework to remove these handcrafted components. To fully utilize full-resolution information and achieve real-time speed, it directly consumes feature points and redundant 3D proposals. Specifically, it first suppresses noisy 3D feature points and then feeds them to 3D proposals for the following RoI-aware refinement. With the gating mechanism to build fine proposal features and the self-attention mechanism to model relationships, our method can produce high-quality predictions with a small computation budget in an end-to-end manner. To this end, we present the first fully end-to-end 3D detector, SRDet, on the basis of VoteNet. It achieves state-of-the-art performance on the challenging ScanNetV2 and SUN RGB-D datasets with the fastest speed ever. Our code will be available at https://github.com/ZJULearning/SRDet.

View on arXiv PDF Code

Similar